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f  We  show  that  numerical  differentiation  of  images  is  an  ill-posed  problem  in  the 
sense  of  Hadamard.  Differentiation  needs  to  be  regularized  by  a  regularizing 
filtering  operation  before  differentiation.  This  shows  that  this  part  of  edge 
detection  consists  of  two  steps,  a  filtering  step  and  a  differentiation  step.v 
Following  this  perspective,  the  paper  discusses  in  detail  the  following  theo-  \ 
retical  aspects  of  edge  detection:  | 

(1)  The  properties  of  different  types  of  filters  -  with  minimal  uncertainty, 
with  a  bandpass  spectrum,  and  with  limited  support-are  derived.  Minimal 
uncertainty  filters  optimize  a  tradeoff  between  computational  efficiency 

and  regularizing  properties. 

(2)  Relationships  among  several  2-D  differential  operators  are  established. 

In  particular,  we  characterize  the  relation  between  the  Laplacian  and  the 
second  directional  derivative  along  the  gradient.  Zero-crossings  of  the 
Laplacian  are  not  the  only  features  computed  in  early  vision. 

(3)  Geometrical  and  topological  properties  of  the  zero  crossings  of  dif¬ 
ferential  operators  are  studied  in  terms  of  transversal ity  and  Morse  theory. 

V'-We  discuss  recent  results  on  the  behavior  and  the  information  content  of 
zero  crossings  obtained  with  filters  of  different  sizes.  These  results 
imply  a  specific  order  in  the  sequence  of  filtering  and  differentiation 

operations.  Topological  properties  are  preserved  by  level-crossings _ 

Setting  a  level  in  the  optimal  filtering  stage  is  a  threshold  operation  - 
which  can  be  implemented  in  an  adaptive  way  -  that  preserves  all  the  "nice" 
geometrical  and  topological  properties  of  zero  crossings. 

Finally,  some  of  the  existing  local  edge  detector  schemes  are  briefly  out¬ 
lined  in  the  perspective  of  our  theoretical  results. 
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1.  INTRODUCTION 


Vision  begins  with  the  transformation  of  a  flux  of  photons  into  a  set  of  intensity  values  at 
an  array  of  sensors.  The  first  step  in  visual  information  processing  is  to  obtain  a  compact 
description  of  the  raw  intensity  values.  The  primitive  elements  of  the  initial  description 
should  ideally  be  complete  in  the  sense  of  representing  the  full  information  contained  in 
the  image,  and  meaningful  (that  is,  capturing  significant  properties  of  the  three-dimensional 
surfaces  around  the  viewer).  Physical  edges  are  one  of  the  most  important  properties  of 
objects  since  they  correspond  to  object  boundaries  or  to  changes  in  surface  orientation 
or  material  properties  (Ballard  and  Brown,  1982;  Binford,  1981,  1982;  Brady,  1981;  Canny, 
1983;  Davis,  1975;  Hildreth,  1980;  Marr  and  Hildreth,  1980;  Pavlidis,  1977;  Rosenfeld  and 
Kak,  1976). 

Three-dimensional  edges  are  often  mapped  by  the  imaging  process  into  critical  points  of 
the  two  dimensional  intensity  profile  formed  in  the  eye  or  in  a  camera.  The  ultimate  goal 
of  edge  detection  is  the  characterization  of  intensity  changes  in  the  image  in  terms  of 
the  physical  processes  that  originated  them.  For  instance,  a  shadow  may  be  distinguished 
from  an  occluding  boundary  and  material  properties  may  be  identified  from  the  associated 
intensity  changes.'  A  traditional  belief  in  computational  vision— that  we  fully  share— is  that 
this  goal  cannot  be  reached  in  a  single  step.  At  least  two  separate  stages  are  required. 
First,  one  needs  to  characterize  the  intensity  changes  in  the  image.  Second,  one  uses 
this  representation,  combined  with  high-level  knowledge,  to  make  assertions  about  the  3-D 
surfaces  and  their  properties. 

The  first  part  of  edge  detection  then,  requires  the  evaluation  of  derivatives  of  the  image 
intensity.  To  characterize  the  types  of  intensity  changes,  derivatives  of  different  type  and 
order  may  be  needed,  possibly  at  different  scales.  The  first  part  of  edge  detection  is  thus 
a  problem  in  numerical  differentiation.  In  this  paper,  we  will  consider  only  this  first  stage 
of  edge  detection  as  the  process  that  attempts  to  detect,  localize  and  characterize  local 
edges,  the  sharp  changes  in  intensity  that  are  natural  primitives  for  later  processing.  We 
will  not  consider  here  the  second  stage  of  edge  detection  that  includes  processes  such 
as  boundary  detection,  segmentation,  region  growing  and  groupings  of  local  edges  (that 
group  local  edge  elements  into  structures  better  suited  for  the  interpretation  of  image  data 
in  terms  of  the  underlying  physical  processes). 

In  this  paper,  we  begin  by  analyzing  the  problem  of  differentiating  a  sampled  image.  We 
show  that  differentiation  is  an  ill  posed  problem  (in  the  sense  of  Hadamard).  Well-posedness 
and  numerical  stability  of  the  differentiational  step  requires  the  regularization  of  the  image 
intensities  by  a  regularizing  filtering  operation  preceding  differentiation.  This  argument 
represents  a  novel  and  rigorous  justification  of  the  basic  sequence  of  filtering  and 
differentiation  that  can  be  recognized  in  all  existing  local  edge  detector  schemes.  We  then 
examine  in  detail  the  filtering  and  the  differentiation  stage.  We  continue  our  analysis  by 
characterizing  properties  of  the  critical  points  of  the  differentiation  operation. 

Our  main  practical  conclusions  in  this  paper  are  (a)  that  Gaussian  filtering,  although  not 
optimal  under  all  conditions,  is  near-optimal,  and  computationally  convenient:  (b)  the  choice 
between  rotationally  invariant  operators  (rotational  fillers  and  rotational  invariant  differential 
HID  operatois.  as  the  Laplacian  or  the  second  derivative  in  the  direction  ol  the  gradient) 
or  directional  (directional  tiller  and  diieclional  differential  DD  opeiatois)  opeiatois  (such 
as  directional  derivatives)  depends  on  the  subsequent  inhumation  processing  task.  HID 
operators  ensure  closed  edge  contours,  that  are  not  provided  in  general  by  DD  operators. 

We  now  outline  the  organization  of  this  paper  in  more  detail. 

'  the  ir.e  ol  color  mfoi illation — which  wo  will  not  discuss  ill  llns  paper  -is  a  natural  extension 
willnri  till.-.  Irameuoik. 


? 


•  *  •/ v  "  •  v*  •**•*«•'  •  *  * 


Tone.  Poggio 


On  Edge  Detection 


1.1.  Organization  of  the  paper 


In  this  paper,  we  consider  edge  detection  as  the  process  of  computing  derivatives  in  the 
two  dimensional  intensity  image.  In  Section  2,  we  show  that  the  problem  of  differentiation 
of  a  sampled  image  is  ill- posed.  We  prove  that  filtering  of  the  image  prior  to  differentiation 
is  necessary  for  regularizing  the  problem  and  make  it  well- posed.  The  filtering  step  is 
analyzed  in  Section  3.  Filters  with  minimal  uncertainty  (Hermite  and  Gabor  functions),  with 
bandpass  properties  (sine  and  prolate  functions)  and  others  that  are  support- limited  are 
reviewed.  Filter  with  minimal  uncertainty  tend  to  optimize  the  trade  off  between  band  limited 
characteristics  (required  for  a  correct  sampling  and  for  “regularizing"  the  differential 
operat  on)  and  computational  efficiency. 

Section  4  is  devoted  to  the  differential  stage.  We  consider  separately  the  second  order  RID 
and  Dl)  operators  and  analyze  their  main  properties.  The  main  focus  is  on  the  localization 
of  the  zeros  of  the  Laplacian  V'2,  the  second  derivative  along  the  gradient  and  the 
usual  second  order  partial  derivatives.  Section  5  considers  the  geometrical  structure  of 
the  contours  formed  by  edge  detectors  and  in  particular  their  closure  property.  For  this 
purpose,  we  use  elementary  tools  from  Morse  and  Thom  theories.  The  problem  of  the 
geometry  of  contours  across  different  spatial  scales — where  scale  is  parameterized  by  the 
size  of  the  filter — is  considered  in  Section  6.  A  comparison  of  the  results  of  our  study  with 
several  previously  proposed  edge  detectors  is  given  in  Section  7,  and  a  discussion  of  the 
“best"  filtering  and  differential  steps  is  given  in  the  final  section. 


2.  COMPUTING  DERIVATIVES  OF  IMAGES 

In  this  chapter,  we  consider  the  problem  of  computing  (spatial  or  temporal)  derivatives 
of  sampled  intensity  images.  Our  main  result  is  a  rigorous  justification  of  filtering  before 
differentiation  in  terms  of  the  theory  of  regularization.  Our  approach  also  clarifies  the  issue 
of  the  optimal  filter  for  edge  detection.  In  practice,  it  justifies  the  use  of  suitable  derivatives 
of  gaussianlike  filters  in  edge  detection  (for  linear  differential  operators). 

In  the  first  section,  we  discuss  the  ill  posed  nature  of  differentiation,  which  is  equivalent  to 
its  lack  of  robustness  against  noise  in  the  input  data.  In  section  2.2.  we  review  the  main 
technic  lies  for  transforming  differentiation  into  a  well  posed  problem.  Section  2.3  shows 
that  numerical  differentiation  can  be  regularized  via  previous  convolution  of  the  image  data 
with  an  appropriate  filter.  In  section  2  4.  we  consider  the  application  of  two  of  the  general 
regular, .ation  techniques,  and  show  that  they  load  to  spline  interpolation  and  to  spline 
approximation  respectively  (prior  to  the  differentiation  stage).  In  these  methods,  regularized 
ditl.  i- 'filiation  is  thus  performed  by  convolving  the  data  with  an  appropriate  derivative  of 
the  regulaii/mg  filter.  In  some  situations,  however,  it  may  be  more  convenient  in  practice 
to  In  :•!  filler  the  data  and  then  differentiate  the  results.  We  consider  some  implications 
ol  this  situation  in  Appendix  1  I  he  problem  of  sampling  appropriately  the  image  prior  to 
tilt  iing  and  difh.-ientiation  is  ills, cussed  in  Appendix  2.  Interpolation,  approximation  and 
dills,  (,  filiation  are  discussed  in  Appendix  4. 

2.1.  Ill-posed  nature  of  differentiation 

In  ma,  tune  vision,  as  well  as  in  most  numerical  problems,  the  data  ate  noisy.  Noise  in 
the  phatolian.  duction  piocess  is  ultimately  unavoidable,  beusor  noise  arises  at  least  in 
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part  from  quantum  fluctuations  in  the  number  of  absorbed  photons  per  sensor  and  unit 
time.  This  represents  a  fundamental  limitation  for  real  time  imagery  when  integration  time 
and  size  of  the  sensors  are  limited  by  the  need  of  high  temporal  and  spatial  resolution. 
It  is  critically  important,  therefore,  that  the  results  of  numerical  operations  performed  on 
the  data  are  not  too  sensitive  to  noise.  It  is  well  known  that  differentiation  is  not  robust 
against  noise.  Even  a  small  amount  of  noise  may  disrupt  differentiation.  Let  us  consider 
a  function  f(x)  and  f(x)  —  f{x)  t  <  sinuw.  f(x)  may  be  close  to  f(x)  according  to  standard 
norms  (//■’, //*...),  provided  <  is  sufficiently  small.  On  the  other  hand,  f'(x)  may  be  quite 
different  from  /  (r)  if  u>  is  large. 

In  the  beginning  of  this  century,  Hadamard  (1923)  defined  a  mathematical  problem  to  be 
well-posed  if  its  solution 

(a)  exists 

(b)  is  unique 

(c)  depends  continuously  on  the  initial  data  (this  is  equivalent  to  saying  that  the  solution  is 
robust  against  noise). 

Most  of  the  problems  of  classical  physics  are  well-posed  in  this  sense,  and  Hadamard 
argued  that  meaningful  physical  problems  had  to  be  well-posed. 

Now  differentiation  of  the  function  f(x)  is  a  typical  ill- posed  problem,  since  it  can  be  seen 
as  the  solution  to  the  inverse  problem 


<j(x)  =  Af(x) 


[2.1] 


where  AJ{x)  is  the  integral  operator 


vT 

-vX- 


h(x  —  x)f(x)ilx 


[2.2] 


where  h  is  the  step  function.  It  is  well  known  that  inverse  linear  problems  in  which  <j(x)  and 
J(x)  belong  to  Hilbert  space  are  ill-posed  (Tikhonov  and  Arsenin,  1977;  Bertero.  1981). 


2.2.  Regularization  techniques 

Rigorous  methods  for  transforming  ill-posed  problems  into  well-posed  problems  have  been 
developed  over  the  past  years  (see  especially  Tikhonov,  1963:  Tikhonov  and  Arsenin,  1977; 
and  Nashed,  1974.  1976).  Regularization  of  the  ill-posed  problem  of  finding  ;  from  the  data 
i/,  such  that  4:  -  y  requires  the  choice  of  suitable  norms  ||-||.  (usually  quadratic)  and  of 
a  stabilizing  functional  ||/’i||.  The  choice  of  the  stabilizing  functional  and  of  the  norms  is 
dictated  by  mathematical  considerations,  and  most  critically,  by  an  analysis  of  the  physical 
constraints  on  the  problem.  There  are  three  main  methods  of  standard  regularization 
(Bertero,  1981): 

(1)  Among  s  that  satisfy  |I/';|J  <  C,  (where  (\  is  a  constant)  find  z  that  minimizes 


4  c 


(2)  Among  -  that  satisfy  ]]4:  ;/||  -  ('•  find  r  that  minimizes 


ll/'-H 


and  (3)  Find  ;  that  minimizes 


II  I'  //II*  •  X||/':||\ 


•1 
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where  X  is  a  regularization  parameter. 

The  first  method  consists  of  finding  the  function  c  that  satisfies  the  constraint  ||/’j||  <  C (, 
and  best  approximates  the  data.  The  second  method  computes  the  function  ;  that  is 
sufficiently  close  to  the  data  and  is  most  "regular".  In  the  third  method,  the  regularization 
parameter  x  controls  the  compromise  between  the  degree  of  regularization  of  the  solution 
and  its  closeness  to  the  data. 

Differentiation  can  also  be  regularized  using  the  stabilizing  operators  introduced  by  Tikhonov 
(Tikhonov  and  Arsenin,  1977;  Bertero,  1981).  In  the  case  of  differentiation  these  operators 
are  equivalent  to  filtering  the  data  with  low-pass  filters  of  the  kind  we  will  discuss  in  chapter 
3. 

In  the  next  section,  we  show  how  to  use  method  2  and  3  directly  for  solving  the  ill-posed 
problem  of  numerical  differentiation.  In  section  2.4,  we  will  consider  a  wide  class  of 
regularizing  filters  that  correspond  to  Tikhonov  stabilizing  operators  and  can  be  used  to 
make  numerical  differentiation  well- posed. 

2.3.  Regularizing  differentiation  with  interpolating  and  approximating  splines 

Poggio,  Voorhees  and  Yuille  (1984/  have  recently  applied  the  second  and  the  third 
regularizing  methods  to  the  problem  of  edge  detection.  Following  Schoenberg  (1964)  and 
Reinsch  (1962),  they  chose  for  /’  the  simplest  form  of  Tikhonov’s  stabilizing  functionals 
with  /'  £,  and  the  usual  norm.  This  choice  corresponds  to  an  "a  priori”  constraint 

of  smoothness  on  the  intensity  function.  Its  physical  justification  is  that  the  noiseless  image 
has  to  be  smooth  in  the  sense  that  all  its  derivatives  must  exist  and  be  bounded  because  the 
image  is  band-limited  by  the  optics.  Physically,  this  constraint  of  smoothness  allows  us  to 
eliminate  effectively  the  noise  that  creeps  in  after  or  during  the  sampling  and  transduction 
process,  and  makes  the  operation  of  differentiation  unstable  and  ill- posed.  This  is,  of  course, 
not  the  only  stabilizing  functional  for  this  problem,  as  we  will  see  in  the  next  section,  but 
it  is  probably  the  simplest  one. 

Let  us  now  consider  in  more  detail  for  the  second  and  third  regularization  methods.  Consider 
a  function  /(*)  defined  in  [a, b]  and  be  A  -■=  «  <  x„  <  x, ...rn  l>  a  mesh  of  distinct  points, 
and 

/*  =  /(**)  m 

the  values  of  /(*)  at  xk.  Given  the  sample  points  of  /*,  the  problem  of  computing  the 
numerical  derivative  f'k  at  xk  is  ill-posed.  The  second  regularizing  method  leads  (usig  the 
stabilizing  operator  /'  j1’:  and  the  /,*  norm)  to  the  search  of  a  function  S(/)  such  that 

(a) 


S(xk)  fk  k  I . r» 

and  (b)  ||/’.s(r)||  is  minimized.  The  stabilizing  functional  /’  is 


1*7] 

1**1 


The  solution  to  this  problem  is  given  by  the  cubic  spline  >.\(j-)  which  interpolates  /(.<■)  in 
^  (Ahlberg.  Nils  on  &  Walsh.  1967)  As  a  consequence,  the  numerical  derivative  j'k  will  be 
the  value  of  •S\U)  in  xk.  For  equidistant  points  the  following  equation  holds 
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where  h  is  the  sampling  period,  and 


"(•r) 


[2.9] 


that  is 


J'k  =  ^I.8W(A+,  -  A-,)  -  •2l.r»(A+a  -  A-*)  +  .0577(A+s  -  A-j).  •  •]  [2-10] 

Poggio  et  al.  (1984)  have  obtained  the  following  theorem  which  is  a  reformulation  ol  results 
due  to  Schoenberg  (1946,  1964): 

Theorem: 77ie  cubic  spline  interpolating  the  data  points  assumed  on  a  regular  lattice  and 
satisfying  the  second  regularizing  method  with  l‘  —  ~  can  be  obtained  by  convolving  the 
data  points  with  the  cubic  spline  filter,  which  corresponds  to  the  IS  function  of  Schoeberg 
(1946). 

Numerical  differentiation,  therefore,  can  be  regularized  for  exact  data  on  a  regular  grid  by 
convolving  the  data  points  with  the  first  derivative  of  the  I*  filter  given  by  Schoenberg, 
which  is  a  cubic  spline. 

In  the  case  of  non-exact  data  which  is  the  most  natural  situation,  the  third  regularizing 
method  has  to  be  used  leading  to  the  problem  of  finding  S(x)  such  that 

E(A-^(xk))2  +  x/|S’"(x)|2«ix  [2.11] 

*=i  J 

is  minimum.  Both  Schoenberg  (1964)  and  Reinsch  (1967)  proved  that  approximating  cubic 
splines  are  the  solution  to  this  variational  problem.  Poggio  et  al.  (1984)  have  proved  the 
following  result: 

Theorem:7he  solution  to  the  variational  problem  (2.11]  in  the  case  of  inexact  data  on  a 
regular  grid  (and  appropriate  boundary  conditions),  can  be  obtained  (a)  by  convolving  the 
data  with  a  filter,  (b)  which  is  a  cubic  spline,  and  (c)  which  is  very  similar  to  a  gaussian. 

This  implies  that  regularized  differentiation  of  image  data  can  be  performed  by  convolving 
the  data  with  the  first  derivative  of  a  cubic  spline  filter,  which  is  very  close  to  the  gaussian, 
as  shown  in  figure  1. 


This  result  probably  is  the  simplest  and  most  rigorous  proof  that  a  gaussian-like  filter 
represents  the  correct  operation  to  be  performed  before  differentiation  for  edge  detection. 
We  refer  to  the  paper  by  Poggio  et  al.  (1984)  for  a  detailed  proof  of  this  result  and  for  a 
comparison  between  the  optimal  filter  and  the  gaussian.  Poggio  et  al.  (1984)  also  analyze 
the  role  of  the  regularizing  parameter  x,  its  connection  to  the  optimal  scale  of  the  filter, 
and  discuss  methods  for  finding  the  optimal  X. 

2.4.  Regularizing  filters 

In  the  previous  section  we  have  seen  that  differentiation  can  be  regarded  as  the  inverse 
problem  of  the  integral  equation 

f/U)  /  x  /(**»  I*'-’] 

where  /(.r)  must  be  recovered  from  the  knowledge  of  the  data  <:(.r).  which  is  usually 
given  only  on  a  discrete  lattice.  This  problem  is  ill-posed,  and  can  be  regularized  by  the 
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a) 


Effect  of  the  smoothing  parameter  X  on  R1 

Figure  1  a)  The  convolution  filter  obtained  by  regularizing  the  ill-posed  problem  of  edge 
detection  with  method  (III)  (see  Poggio  et  a!.,  1984).  It  is  a  cubic  spline  (solid  line),  very 
similar  to  a  gaussian  (dotted  line),  b)  The  first  derivative  of  the  filter  for  different  values  of 
the  regularizing  parameter  X,  which  effectively  controls  the  scale  of  the  filter  (from  Poggio 
and  Torre,  1984). 


regularizing  methods  previously  mentioned.  Furthermore,  Tikhonov  and  Arsenin  (1977,  see 
also  Bertero.  1981)  have  proved  that  it  is  in  general  possible  to  regularize  inverse  problems 
by  using  Tikhonov’s  stabilizing  operators.  For  equations  of  the  convolution  type  as  equation 
12.12],  the  stabilizing  operators  correspond  to  convolving  f/(x)  with  a  filter  A'(x,  n),  (where 
r»  >  (I  is  a  parameter)  whose  Fourier  transform  /<’(w,  <»)  satisfies  the  following  conditions: 

(Cl)  h'(ui.n)  is  bounded  for  <»  >  0  and  all  w. 

(C2)  /•'(*>,<*)  is  an  even  function  with  respect  to  u>,  and  it  belongs  to  /,•[(- a o,  +oo), 

(C3)  /■’(«>, <») ju/  belongs  to  l  >(-o o,  too). 

(C4)  For  every  «  >  0  it  holds  li"i|w|,-.x-  £'(“'>")  -  0. 

(C5)  «)  ►--*  I  as  •-*  0  and  i'\ w,0)  —  I. 

This  regularizing  filter  is  equivalent  to  a  smooth  low  pass  filter.  In  the  next  chapter,  we  will 
discuss  three  different  classes  of  low  pass  filters  that  have  been  used  for  edge  detection. 
The  first  two  of  them  fully  satisfy  the  previous  conditions  (C1-C5),  and  are  therefore 
regularizing  filters  in  Tikhonov  s  sense.  As  a  final  remark,  it  is  interesting  to  notice  that 
this  regularizing  filters  usually  correspond  to  the  solution  of  variational  principles  of  the 
type  provided  by  the  third  regularization  method  with  an  appropriate  stabilizer  /’  (compare 
Tikhonov  and  Arsenin,  1977,  page  121). 


3.  FILTERING 

In  this  section,  we  will  make  some  preliminary  observations  on  filtering  and  then,  we  will 
review  three  kinds  of  low  pass  filters,  which  have  been  used  in  machine  vision  for  edge 
detection.  Wo  will  consider  bandpass  filter  in  section  3.1.  support  limited  filters  in  section 
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3.2  and  minimal  uncertainty  filters  in  section  3.3.  Our  conclusion  is  that  bandpass  filter  as 
well  as  minimal  uncertainty  filters  are  good  regularizing  operators  for  differentiation  in  the 
sense  of  Tikhonov,  while  support-limited  filters  are  only  marginally  useful. 

As  in  the  study  of  functions  in  analysis,  many  properties  of  intensity  changes  can  be 
characterized  in  terms  of  zeros  of  appropriate  derivatives.  For  instance,  one-dimensional 
step  edges  in  intensity  correspond  to  extrema  of  the  first  derivative,  whereas  root  edges 
correspond  to  zeros  in  the  first  derivative.  The  main  goal  of  the  filtering  and  differentiation 
stage  in  edge  detection  is  to  produce  a  representation  of  zeros  and  extrema.  Interestingly, 
the  type  of  derivative  —  whether  directional  or  rotationally  invariant  —  and  the  type  of 
representation  —  whether  zeros  or  extrema  —  dictate  some  general  properties  of  Ihe  filter 
to  be  used.  We  will  now  briefly  discuss  these  two  points. 

The  first  point  is  obvious:  directional  derivatives  require  one-dimensional  filters  properly 
oriented  along  the  chosen  direction;  when  rotationally  invariant  operators  are  used,  the 
filter  /  is  a  function  of  the  radial  coordinate  p. 

We  restrict  ourselves  to  examine  linear,  space  invariant  filters.  Since  isotropy  can  be 
assumed,  the  shape  of  filters,  when  viewed  one-dimensionally,  is  an  even  or  odd  function. 
Let  us  now  consider  the  implications  of  this  for  the  case  of  step  intensity  edges.  Because 
of  the  arguments  developed  in  the  previous  section,  we  detect  intensity  edges  from  the 
zeros  of  a  suitable  derivative  of  the  filtered  intensity  profile  (i.e.  its  critical  points).  If  the 
shape  cf  the  step  edge  to  be  detected  is  A'(x),  defined  as 


X 

X 


>  0 

<  0’ 


then  the  output  </(x)  of  the  convolution  f(x)  *  S(x)  where  /(x)  is  the  filter,  will  be 


g(r)  =  V{*)  -  F{~oo),  [•'Ll] 

with  F(x)  the  integral  primitive  of  f(x).  Therefore: 

•The  extrema  of  <j(x)  correspond  to  the  zeros  of  /(x). 

•The  zero-crossings  of  correspond  to  the  extrema  of  /(*). 

Three  consequences  can  be  derived  from  these  observations: 

1.  If  we  are  interested  in  the  extrema  of  the  output  <j(x),  and  if  we  want  to  have  an  extremum 
located  at  the  position  of  the  edge,  then  f(x)  must  be  an  odd  function. 

2.  If  we  are  interested  in  the  zero  crossings  of  <i(x),  and  if  we  want  to  have  a  zero-crossing 
located  at  the  position  of  the  edge,  then  f(x)  must  be  an  even  function. 

3.  If  we  are  interested  in  the  extrema  or  zero-crossings,  and  if  f(x)  has  many  zero-crossings, 
we  will  have  many  secondary  extrema  or  zero-crossings.  To  avoid  false  edge  detection, 
f(x)  should  have  the  least  number  of  zero  crossings,  and  the  optimal  situation  would  then 
be: 

•If  f{x)  is  odd.  then  f(x)  has  only  one  zero. 

•If  /( jr)  is  even,  then  f(x)  has  no  zero. 

3.1.  Band-limited  filters. 

Band  limited  filters  are  an  obvious  choice  for  regularizing  differenliation.  since  the  simplest 
way  to  avoid  harmful  noise  is  to  lilter  out  high  frequencies  that  are  amplified  by  differentiation. 
Linear  and  circular  prolate  functions  constitute  an  especially  interesting  class  of  band-limited 
fillers  (T rieden.  1071):  Landau  and  Pollack.  19Ct).  Linear  prolate  functions  <,  „(.r)  are  defined 
by  Ihe  relation 
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where  X„  are  called  "linear  prolate  eigenvalues".  From  [3.2],  we  see  that  %\(x)  depends 
on  two  parameters,  j •„  and  It,  whose  significance  will  be  seen  later.  The  value  of  X„  is  a 
function  of  r  x,M  and  may  be  written  as  X„(j-,r).  </'„(x)  depends  on  The  main  properties 
of  v„( ••)  are 

(1)  v„( j-)  are  band-limited. 

(2)  o„|.r)  are  orthogonal  on  both  the  interval  |  j-,i ,  j-(I)  and  |  oo,  t  oo),  with 


/ 

/ 


>l’n(x)ij',„(x)dx 

'l’n(x)fm(x)Jx 


Xji^ntn 

burn 


[:u] 


(3)  (,  »,(■<■)  form  a  complete  set  of  functions  of  the  space  of  band  limited  functions  whose 
Fourier  transform  /'(w)  is  /■'( u;)  0  for  |cj|  >  ft. 

In  the  defining  expression  [3.2]  of  there  are  three  constants,  1),  X„  and  x„.  From 

[3.3|.  we  see  that  It  is  the  cutoff  frequency  and  from  [33],  x„  is  half  the  length  of  the 
finite  interval  over  which  linear  prolate  functions  r„(j)  are  orthogonal.  From  [3.3],  we  also 
see  that  X„  represents  the  fraction  of  energy  of  <i\,(.r)  within  | j-|  x„.  The  dependence  of 
X„  on  r  .r,,!l  is  shown  in  Figure  4.3  of  Frieden  (1971).  Therefore,  once  we  have  chosen 
11.  we  can  find  and  consequently  x„  such  that  the  energy  of  t is  almost  completely 
contained  in  \x\  <  x„. 

Linear  prolate  functions  have  the  nice  property  that  the  band  limited  function  with  cutoff 
frequency  11  and  maximal  energy  concentrated  in  j  .r„,.r„|  is  </„(.r)  with  <■  fix,,.  Similarly, 
the  odd  band  limited  function  with  cutoff  frequency  !!  that  has  maximal  energy  concentrated 
in  x x..  is  i,  d  r)  with  <•  si  .r,  .  Linear  prolate  functions  are  also  useful  for  solving  the 
inverse  problem;  that  is.  the  strictly  support  limited  function  in  |  j„,.r„|  that  has  maximally 
concentrated  frequencies  in  is 


/  i>j„  v-..( j-.  <•)  <•  x{,  ■  ii,  |:m] 

where  is  the  operator  defined  as 

k  ju)  {  /w  j;|  v_ .  m 

These  results  show  clearly  the  difference  between  j  >  and  sinc(j).  They  are  both 
band  limited,  but  c ,  ( j  )  falls  oil  more  rapidly  than  siiic(j-)  (see  Fig.  2  of  Landau  and  Pollack 
1961)  On  the  other  hand,  the  strictly  support  limited  function,  which  has  the  minimal  spread 
of  frequencies,  is  not  a  Hoar  function  or  a  Difference  Of  Boxes  filter  (see  later)  but  is 
/b  ■('  ) 

Oscillations  in  the  filter  may  produce  ringing  phenomena  in  the  edge  detection  process. 
To  reduce  these  phenomena  it  is  necessary  to  have  maximal  energy  of  e,.(.r)  (or  i.",(j)) 
conc  entrated  in  the  main  let  e  With  a  value  of  ■-  equal  to  7.  more  than  99  per  cent  of  the 
energy  of  <.  ..(./ )  and  i. -,(.»)  is  concentiated  in  |  x.,,x„\. 

It  is  immediate  to  verify  that  tiamll'miled  filters  satisfy  all  conditions  (see  section  3)  of 
likhonov  in  order  to  regulan/e  dillerentiation 

II  we  .  re  mt  n  ..led  in  rot  ition. illy  invariant  two  dimensional  filters  that  are  hand  limited, 
we  can  simply  like  even  linear  prolate  functions  e„(i).  n  (1.  t  and  substitute  x  with 


V 
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\jt 2  +  y-  —  p.  Now  V'«(/0  is  a  band-limited  function,  but  does  not  have  the  two-dimensional 
analog  of  properties  (1)-  (3).  These  properties  are  satisfied  by  the  circular  prolate  functions 
defined  by  relation: 

["  J^p)*n[p)pdp  (•*-«! 

where  ./„  is  the  Bessel  function  of  order  zero. 

3.2.  Support-limited  filters 

All  real  filters  have  a  finite  extension  and  are  support-limited.  Computational  efficiency 
requires  that  the  support  of  a  filter  is  as  compact  as  possible.  Therefore  it  is  interesting  to 
investigate  the  properties  of  filters  with  strictly  limited  support.  The  simplest  even  filter  with 
a  strictly  limited  support  and  with  unitary  energy  is 

=  I/n/2 T)  |*|  <l) 

-=0 

whose  Fourier  transform  F(ui)  is 

'-'M  = 

In  two  dimensions,  we  have 

whose  Fourier  transform  is 

(3-8] 

This  kind  of  filtering  represents  the  well-known  “blurring”  of  the  image  through  a  circular 
aperture  of  radius  p„. 

It  is  important  to  observe  that  this  class  of  support-limited  filters  fails  to  satisfy,  in  a  strict 
sense,  the  five  conditions  of  section  2.4.  In  particular,  condition  (3)  «»)jw  belongs  to 

oo ,  -+  oo )  is  not  satisfied  (because  differentiation  introduces  back  high  frequencies  in 
the  same  amount  as  they  are  removed  by  this  type  of  filtering).  Thus,  support  limited  filters 
are  not  good  regularizing  filters  in  the  sense  of  Tikhonov.  Nonetheless,  this  class  of  filters 
can  be  still  considered  as  regularizing  operators  in  a  weak  sense. 

If  we  are  interested  in  odd  filters,  the  simplest  support-limited  filter  is 


1*1  >  »’ 


sin  u>l) 


(3-7) 


/(*) 


0 

I 

v'-’/J 

\ 

\\ll) 


M  >  » 

0  <  x  <  I) 

I 

I)  <  X  <  0 


(3.S«( 


whose  Fourier  transform  is 


\XHh\ 


10 


,N, 


v.v. 


a;*  /■ 
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This  filter  has  already  been  proposed  by  Herskovitz  and  Binford  (1970)  and  is  commonly 
called  DOB  (Difference  of  Boxes).  It  is  also  a  Haar  function  (see  Fig.  19  of  Harmuth  1972 
and  page  399  of  Kolmogorov  and  Fomine,  1974).  The  system  of  Haar  functions  is  complete 
and  constitutes  a  basis  for  all  square  integrable  functions  on  a  bounded  interval.  This 
property  may  have  some  relevance  in  the  context  of  image  pro*.  „rsing  with  Haar  functions. 
Support-limited  filters  that  are  even  functions  can  be  easily  extended  in  two  dimensions 
by  a  simple  rotation  around  the  origin.  A  complete  set  of  support-limited  functions  in  two 
dimensions,  which  can  be  used  as  filters,  is  the  Haar  system  with  two  variables  (see 
Harmuth  1972).  The  Haar  function  of  equation  (3.8]  has  the  nice  property  of  being  the 
optimal  support-limited  filter  that  maximizes  the  signal-to- noise  ratio  for  an  ideal  step  edge, 
S(s).  It  is  easy  to  see  that  spatial  spread  of  f(x)  favors  the  signal-to-noise  ratio,  while 
spatial  concentration  favors  localization,  for  instance  of  zero-crossings.  This  can  be  seen 
as  another  formulation  of  the  uncertainty  relation  (Canny,  1983). 

3.3.  Filters  with  minimal  uncertainty 

In  the  two  previous  sections,  we  analyzed  band -limited  and  support-limited  filters!  Band- 
limited  filters  have  theoretically  infinite  support.  A  drawback  of  support-limited  filters  is  that 
they  are  regularizing  only  in  a  weak  sense.  It  is  natural  then  to  try  to  find  an  optimal 
compromise  between  these  two  types  of  filters.  A  measure  of  the  spread  of  a  function 
/  e  /,2(!R)  in  the  space  and  frequency  domain  is  the  uncertainty  A(/,  defined  as: 

A  U  =  HA',  [3.9] 

where 

,  /+°?(x  -  x)2f2(x)dx 

X2  =  x-i-  -■  -----  3.10 

/-*/*(*H* 


xf2(x)dx 


[3.11] 


n2  =  ~ 


/•'(a. )  is  the  Fourier  transform  of  /(*)  and 

r+x 

J -OP 


[3.12] 


[3.13] 


Notice  that  ft2  is  proportional  to  the  density  of  of  zero-crossings  for  Gaussian  white  noise 

(Papoulis.  1962:  Papoulis.  1965.  p.  487).  It  is  well-known  that  the  Gaussian  function  < 
is  the  real  function  /  <.  /,-(»)  that  minimizes  the  uncertainty  A //.  On  these  grounds  it  has 
been  proposed  by  Marr  &  Hildreth  (1980)  as  the  optimal  filter.  The  uncertainty  of  an  even  or 
an  odd  function  /  i  /.'(if)  can  be  easily  computed  it  its  representation  in  terms  of  Hermite 
functions  is  known;  that  is,  if  we  know  the  set  of  r„  such  that: 


I  \ 

/(•I  )  ^  j  'r'„ (  I"), 

„  II 


3.11] 


^  *-■ *- 


!ai£t 
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where 


¥?„(*)  —  ‘  H n(*)  13.15] 

//„(j-)  is  the  Hermite  polynomial  of  order  «.  The  uncertainty  of  <fiu(x)  is  simply  n  t-  [.  If  f(x) 
is  an  even  function,  then 


-foe 

/(*)  -  H 

fc— 0 


13.161 


and  the  uncertainty  AV  is  given  by 


Ml  =,-  H* 

,  _  E*+~o( +  4W* 

ii/ll2  1:,-I71 

..  _  f2)(2fc-f  I) 

11/11* 


If  /(:/•)  is  odd, 


+  oc 

/(*)  =  Y1 

h=0 


and  the  uncertainty  Ml  is  given  by 

All  ^ 

a  E/,  0 +  I  •'  i  V1**  1  1 

ll/ll2  " _ •  13-161 

. .  T_  E *  L1  ^  4  3)(3f*  *  3) 

ni/ii2 . 

Equations  [3. 1 6]— [3. 1 9]  follow  from  properties  of  Hermite  functions.  We  can  easily  see  that 
the  uncertainty  of  Hermite  functions  <p.,(j-)  increases  with  v  as  the  number  of  zero-crossings 
of  *’„(.r)  increases.  From  these  observations,  we  see  that  good  filters  will  be  composed 
by  Hermite  functions  with  low  w.  From  the  point  of  view  of  uncertainty,  the  optimal  even 

filter  is  >  -•  ,  and  the  optimal  odd  filter  is  r<  -•  (the  two-dimensional  case  has  been  treated 
by  Daugman,  1904a).  Another  class  of  functions  with  small  uncertainty  consists  of  Gabor 
functions 


They  are  complex  functions  of  a  real  variable  and  have  uncertainty  A f  ■’  equal  to  ' .  However, 
the  real  and  imaginary  part  of  «;.„(.i )  do  not  have  minimal  uncertainty.  The  only  real  function 
with  uncertainty  equal  to  is  the  Gaussian. 

Fillers  with  minimal  uncertainty,  as  well  as  bandlimited  filters,  satisfy  the  conditions  of 
section  3  in  order  to  regularize  differentiation. 
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Figure  2  Comparison  between  the  gaussian  and  a  prolate  function.  See  text. 


3.3.1.  Relation  between  prolate  and  Hermite  functions 

The  essential  difference  between  prolate  and  Hermite  functions  is  that  the  former  are 
band-limited  and  fall  as  while  the  latter  fall  off  faster— somewhat  too  fast  to  be 
band-limited  It  has  been  shown,  however,  that  a  crude  approximation  of  0»(.r),  when  c  is 
large  (see  Slepian  1965)  is 


V„(x)  ^  IK{xy/ir),  [3.21 1 

where  l>„  is  a  Weber  parabolic  cylinder  function.  Now 

-=  r~  f/„(xy/r)  =«  I />„(x),  (3.22] 

where  //„(j-)  are  the  usual  Hermite  polynomials.  This  approximation  fails  for  large  x ,  where 
i. ,  ( r )  falls  off  as  \  and  l>„(.r)  as  a  Gaussian  function.  However,  when  r  is  larger  than  7, 
then  and  have  more  than  99  per  cent  of  their  energy  in  |  where  the 

approximation  [3.22]  is  satisfactory.  In  Fig.  1,  we  see  the  comparison  between  a  Gaussian 

function  (dotted  line)  with  variance  equal  to  y/;  and  o,.(x)  (solid  line)  with  r  equal  to  7. 
i  ,.(x)  has  been  computed  according  to  the  approximation  described  in  Frieden  (1971). 
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3.3.2.  Gaussian  filtering  and  the  heat  equation 

We  consider  here  briefly  an  interesting  analytic  property  of  Gaussian  filtering  of  images. 

Gaussian  filtering,  i.e.  the  convolution  of  the  image  /(*,)/)  (when  l{x,y)  is  bounded  and 
continuous)  with  the  Gaussian, 


can  be  seen  as  a  solution  at  an  appropriate  time  t  —  ^  of  the  two  dimensional  heat 
equation 


with  the  initial  condition: 


i)2u  t)2u  du 
Ox2  +  iiy'1  dt ' 


«(*<!/.  0)  =  /(*,»).  [3.25] 

This  is  because  the  "source  solution"  of  the  heat  equation  (Widder,  1975)  is 

.  cii-Ltii! 

Kx>  y<  0  = -  "  .  [3-26) 

v/Trrt 

with  time  playing  the  role  of  the  variance,  that  is, 

a1  =  2t.  [3.27] 

From  Theorem  4.1  of  Widder  (1975)  the  solutions  of  the  heat  equation  are  entire  functions 
of  i  and  y.  In  other  words  the  convolution  of  a  continuous  and  bounded  function  with 
a  Gaussian  generates  an  entire  function.  This  characterizes  well  the  strong  regularizing 
properties  of  the  gaussian  filter. 


4.  Differentiation  stage 

In  this  chapter,  we  will  discuss  the  properties  of  some  differential  operators  that  have  been 
proposed  and  used  in  edge  detection.  We  first  briefly  consider  directional  derivatives  in 
section  4.1.  In  section  4.2  we  discuss  properties  of  two  second-order  rotationally  invariant 
differential  operators:  the  Laplacian  and  the  second  derivative  along  the  direction  of  the 
gradient.  We  stress  here  that  it  is  unlikely  that  zero-crossings  of  one  differential  operator 
—  such  as  the  Laplacian  —  are  sufficient  for  early  vision. 

The  many  two  dimensional  differential  operators  that  can  be  used  for  detecting  sharp  changes 
in  intensity  can  be  classified  according  to  whether  they  are  (a)  linear  or  nonlinear,  and  (b) 
directional  or  rotationally  symmetric.  In  this  paper,  we  use  the  (somewhat  inappropriate) 
terminology  of  zeros  of  a  differential  operator  l>f  (/  defined  in  l  c  St*)  in  the  sense  of  the 
locus  of  points  of  l  such  that  />/  o.  This  notion  is  different  from  the  usual  definition  of 
the  kernel  of  an  operator  l>.  that  is.  the  set  of  function  /  such  that  DJ  (l  in  I  . 


•.  •„ 
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4.1.  Directional  differential  operators  (DD) 

The  directional  differential  DD  operators  used  in  edge  detection  are  the  usual  directional 
derivai ives.  The  use  of  directional  operators  has  been  criticized  (Hildreth,  1980)  on  the 
grounds  that  such  operators  lead  to  smearing  of  zero  crossing  contours  (see  Fig.  11  of 
Hildreth  1980).  In  that  case  the  vertical  operator  was  implemented  with  an  operator  of  n  x  m 
pixels.  Smoothing  was  performed  in  both  the  orthogonal  and  the  parallel  direction  to  the 
filter's  orientation.  A  correct  implementation  of  a  vertical  derivative  however  consists  of  an 
operator  of  I  x  ™  pixels.  The  smearing  observed  by  Hildreth  (1980),  however,  is  not  due 
to  the  use  of  a  directional  operator  but  to  the  distortion  introduced  by  a  too  large  width  of 
the  operator.  The  concomitant  use  of  several  directional  derivatives  has  been  proposed  by 
several  authors  (Binford  1982;  Canny  1983).  Since  in  3?-  the  directional  derivative  in  any 
arbitrary  direction  can  be  expressed  in  terms  of  and  f!y .  it  is  evident  that  in  a  noise-free 
image  the  use  of  more  than  two-directional  derivatives  is  of  no  help  at  all.  In  a  noisy  image 
the  use  of  several  directional  derivatives  may  be  useful  for  increasing  the  signal-to- noise 
ratio. 

We  will  see  in  a  later  paper  that  the  use  of  just  two  narrow  directional  derivatives  is  sufficient 
to  detect  all  edges  detected  by  rotationally  invariant  differential  operators  or  by  a  large  set 
of  directional  derivatives. 

4.2.  Rotational  invariant  differential  operators  (RID) 

Rotationally  symmetric  operators  have  several  attractive  features.  Two  of  the  most  widely 
used  operators  of  this  class  are  the  Laplacian  (V2,  which  is  linear)  and  the  second 
directional  derivative  along  the  gradient  (£^)  which  is  nonlinear.  We  will  derive  in  this 
section  several  properties  of  the  two  derivatives  and  especially  of  their  zero-crossings.  In 
particular,  we  derive  a  necessary  and  sufficient  conditions  on  image  intensities  for  the 
zero-crossings  of  the  two  derivatives  to  coincide. 


4.2.1.  Null  space  of  the  Laplacian  and  subharmonic  functions 

Certain  classes  of  functions  do  not  originate  zero-crossings  in  the  Laplacian:  they  are 
harmonic  and  subharmonic  functions.  Harmonic  functions  are  the  null  space  of  the  Laplacian 
operator.  Interestingly,  they  are  invariant  with  respect  to  heat  diffusion  and  therefore  do  not 
change  under  convolution  with  a  gaussian  of  any  size  (Yuille.  pers.  comm.).  This  property, 
however,  is  not  stable.  Another  non  trivial  result  is  that  any  non-linear  function  <i>  of  an 
harmonic  function  has  zero  crossings  at  the  locations  of  the  inflection  points  of  4  (Yuille, 
Poggio  and  Ullman,  pers.  comm.).  Harmonic  functions  are  non-generic,  in  the  sense  that  a 
small  perturbation  destroys  the  harmonic  property. 

Subharmonic  functions  are  such  functions  that  the  modulus  of  their  Laplacian  is  everywhere 
positive  (Duugman,  1984a).  These  functions  are  robust  against  small  perturbations. 


4.2.2.  Cartesian  and  polar  form 

We  just  give  the  explicit  representation  of  the  two  operators  in  cartesian  and  polar 
coordinates: 


V-7  /„  .  Jy 


<#/»“  /i  t ift  ft-  < H) - 


.%\ii 
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=  /x/-  '  2Uyf,y  4  fljyy  =  ,2J>f  »/  {>*!_  jJiUVjll 
< ht 2  /  ;  -t  /•  p1  i)f>  HO  iipiiO  (>  *  \  <>0  )  <)0'1 

_  1  V  PA’  a* _ I _  M 

/)■'<  <)/>  V  no  )  v  d/>  /  < V2  / ,  /  a/\2  ’ 

+  p-VMJ 

We  also  give  the  explicit  representation  for  the  second  directional  derivative  in  the  direction 
orthogonal  to  the  gradient: 


r'l2/  rif**-V,Ivf*v+llfyv 

‘>n'i  h  +  n 


m 


Remark 

The  representation  in  polar  coordinates  shows  clearly  that  the  two  operators  are  rotationally 
symmetric,  since  their  form  does  not  change  for  a  rotation  of  the  coordinate  system  O'.  We 
can  now  state 

•Characteristic  Property  of  Rotationally  Symmetric  Operators.  A  sufficient  condition  for 
an  operator  to  be  rotationally  invariant  is  that  0  appears  only  as  derivative  in  the  polar 
representation  of  the  operator. 


4.2.3.  Simple  properties  of  V2  and 

Marr  and  Hildreth  (1980)  had  attempted  to  prove  that  in  most  cases  the  zero-crossings 
of  the  Laplacian  coincide  with  the  intensity  edges.  Since  zeros  of  the  second  directional 
derivative  along  the  intensity  gradient  are  the  natural  definition  of  intensity  edges,  we  are 
able  to  give  here  a  more  rigorous  characterization  of  the  problem,  in  terms  of  four  simple 
properties. 

(I)  If  the  image  f(x,y)  can  be  represented  as  a  function  of  only  one  variable,  i.e.,  /( j-,  ;/„), 

the  two  operators  V2  and  ~  are  equivalent,  i.e.,  —  V2/. 

As  a  consequence,  for  f(r,yn)  the  zeros  of  :'‘Ji  and  of  V2/  coincide. 

Property  I  is  similar  but  not  identical  to  the  “linear  variation"  result  of  Marr  and  Hildreth 
(1980),  which  states  that  if  /  changes  at  most  linearly  along  the  edge  direction  /,  then 

v2/  --  ‘~4 

(II)  If  —  JIV  0  at  /’,  when  --  0,  the  zeros  of  coincides  with  the  zeros  of  V2/. 

The  assumptions  on  the  image  are  here  stronger  than  the  condition  of  linear  variation  of 
Marr  and  Hildreth  (1980).  but  are  equivalent  to  the  assumptions  of  their  theorem  1:  locally 
around  the  zero-crossing,  /  has  the  form  f(r,  y)  -  ax  i  by  +  r. 

(III)  If  f(x,  y) .---  /(/i)  is  rotationally  symmetric.  V2/  and  differ  by  the  additive  term  ''JJ. 

For  circularly  symmetric  functions,  the  zeros  of  V2/  are  farther  apart  than  the  zeros  of  . 
This  lack  of  localization  by  V2  (for  circularly  symmetric  patterns)  can  also  be  seen  in  the 
fact  that  zeros  of  V2  (but  not  of  ,“, )  “swing  wide"  of  corners. 


(IV).  (a)  is  nonlinear 

(b)  neither  commutes  nor  associates  with  the  convolution,  i.e.. 
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/)/(£*»)*/  M 

(( )  j- 1  is  a  linear  operator  on  /,  it  /  /(/>),  but  not  shift  invariant. 

(c  )The  mean  ot  j-'..  applied  to  a  zero  mean  function  need  not  to  be  zero. 


4.2.4.  Geometric  characterization  of  the  zeros  of  Va  and 

It  is  interesting  to  consider  under  which  conditions  the  zeros  of  the  Laplacian  coincide 
with  the  zeros  of  the  second  directional  derivative  along  the  gradient.  Zeros  of  the  second 
directional  derivative  along  the  gradient  are  a  natural  way  of  characterizing  and  localizing 
intensi  y  edges.  Zeros  of  the  Laplacian,  however,  are  extensively  used  for  their  computational 
converience.  In  this  section  we  derive  rigorous  results  that  clarify  completely  this  set  of 
questicns. 

Let  us  consider  the  intensity  surface  represented  as  A'  —  (1,9,2),  where  2  =  /(x,  y)  with 
/  <_  ("(/>),  I)  c  and  r  >  2. 

The  moan  curvature  of  the  surface  A'  is 

Jf  /‘.W  I-  (llj  -  2  F  M  (I  +  fx)fvV  t'  (I  +  f  y)fxx  ~  2/z/lf/xy  (J 

"  =  =  2gi  ’  1  B) 


where 


H  =•■  I  *  f'i  r  ==  JJy  ('■  =-- 1  +  fl  H-7] 

are  the  coefficients  of  the  first  fundamental  form  l(<lr,<ly)  (Lipschutz,  1969,  Pogorelov, 
1965),  and 


/ 


f"  M  -  Sly  /V  -  ^v, 
<j  u  < 7 


[4-7] 


with  </-  -  I  .  /;  1  /;],  are  the  coefficients  of  the  second  fundamental  form  ll(<lx,<ly). 
We  use  equations  [4. 2).  [4.3]  and  [4.8]  and  the  property 


for  writing  II  in  terms  ot  V-  and  : 

4(/rv  ,v/)"4) 


// 


[4-8] 


[4.10] 


We  can  now  characterize  the  connection  between  the  zeros  of  V-  and  the  zeros  of  : 

tm  -  * 


Piopctiv  V  If  V/  /  11.  the  zeros  of  /  coincide  wilh  the  zeros  of  V-  iff  tfie  mean  curvature 

II  is  zero. 
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Thus,  only  for  surfaces  with  minimal  curvature  (//  =  0).  the  zeros  of  ^  coincide  with 
the  zeros  of  V-/  where  the  gradient  of  /  is  different  from  zero.  Note  that  (M.  Kass, 
personal  communication)  V2/  has  the  same  zeros  as  where  the  curvature  of  the  lines 
of  level-crossings  of  the  intensity  image  is  zero.  Recently.  Berzins  (1984)  analyzed  in  detail 
the  behavior  of  zeros  of  the  Laplacian  of  a  Gaussian  filtered  image  around  corner  edges 
and  edges  with  high  curvature.  He  showed  that  the  zeros  of  the  Laplacian  are  displaced 
from  the  true  edge  by  less  than  o  (the  variance  of  the  Gaussian  filtering)  when  the  radius 
of  curvature  is  large  compared  to  o,  and  when  the  distance  to  the  nearest  sharp  corner  is 
large  compared  to  ~  (where  H  is  the  angle  of  the  corner  in  radians).  Note  that  eq.  [4.10] 
shows  that  the  difference  between  21/  and  v-/  is  small  if  the  mean  curvature  II  is  small. 
Smoothing  the  image  with  a  two-dimensional  filter  reduces  the  curvature  (and  more  so 
tor  larger-sized  filters).  Therefore,  we  may  expect  that  in  littered  images,  V2/  will  perform 
almost  as  well  as 


4.2.5.  The  normal  curvature 

The  second  directional  derivative  along  the  gradient  has  a  simple  interpretation  in  terms  of 


the  normal  curvature  along  the  gradient.  The  normal  curvature  Kn 
gradient  is  (Lipschutz,  1969) 

in  the  direction  of  the 

Ldu 2  +  2  Mdvdv  +  Ndv* 

"  Edu 2  +  2  Fdudv  +  (Jdv2 

[4.11] 

with  du  and  dv  as  direction  numbers.  Setting  du*  +  dv*  =  l ,  the  direction  numbers  along 
the  gradient  are 

.  /» 

|V/| 

[4.12] 

II 

■5 

[4.13] 

Thus,  equations  [4.11]  and  [4.13],  together  with  equations  [4.6]  and  [4.7],  lead  to 


K  „  = 


iiL/. 

g3  tin2  J 


[4.14] 


In  particular,  it  follows 
Property  VI 

The  second  directional  derivative  along  the  gradient  and  the  normal  curvature  in  the 
direction  of  the  gradient  have  the  same  zeros  when  |V/|  f  o. 

Our  geometrical  characterization  of  the  gradient  and  the  second  derivative  along  the 
gradient  is  completed  by  Appendix  3,  that  gives  the  geodesic  curvature  of  the  curve  directed 
along  the  gradient.  For  surfaces  of  revolution  the  geodesic  curvature  of  such  lines  is  always 
zero. 

The  operator  j'J.  and  the  normal  curvature  in  the  direction  of  the  gradient  l<„  are  not 
defined  when  |C/|  ~  <i  In  this  case,  the  direction  of  the  gradient  is  underdetermined, 
although  the  Hessian  can  of  course  be  diagonalized  (determining  the  principal  directions). 
Thus.  has  the  disadvantage  with  respect  to  VJ  that  it  is  not  defined  everywhere. 
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4.2.6.  Potential  biological  consequences 

A  natural  question  arising  from  these  comparisons  is:  which  derivative  operators  are  used 
by  the  human  visual  system?  It  is  obvious  from  the  earlier  sections  that  several  different 
derivat  ves  possibly  at  different  scales  have  to  be  used  for  efficient  edge  detection.  It  would 
be  very  strange  if  the  human  visual  system  would  make  use  of  only  one  differential  operator. 

The  important  questions  is  therefore  which  operators  or  combinations  thereof  are  used 
in  diffe  rent  visual  tasks  and  under  different  conditions.  Zero-crossings  in  the  output  of 
directional  second  derivatives  approximated  by  the  difference  of  one-dimensional  Gaussians 
(DOG)  were  suggested  by  Marr  &  Poggio  (1977)  in  their  theory  of  stereo  matching.  Marr  & 

Hildreth  (1980)  later  proposed  the  rotationally  symmetric  Laplacian  V'-Y;  (approximated  by 
a  rotat  onally  symmetric  DOG)  for  edge  detection  and  for  stereo  matching.  Psychophysical 
evidence  does  not  rule  out  either  of  these  schemes.  Physiology  shows  that  a  class  of 
retinal  ganglion  cells  performs  a  roughly  linear  operation  quite  similar  to  the  convolution 
of  the  image  with  the  Laplacian  of  a  Gaussian.  Data  on  cortical  cells  are  still  somewhat 
contradictory  on  whether  some  simple  cells  may  perform  the  equivalent  of  a  linear  directional 
derivative  operation,  or  instead,  signal  the  presence  of  a  zero-crossing  of  the  rotationally 
symmetric  V'*Y.\ 

On  physiological  grounds,  it  seems  unlikely  that  retinal  cells  could  perform  the  rotationally 
symmetric  nonlinear  operation,  although  not  all  classes  of  ganglion  cells  have  been 
tested  properly  to  allow  a  firm  conclusion.  In  particular,  one  dimensional  and  rotationally 
symmetric  patterns  are  customarily  used  as  stimuli  for  physiological  experiments.  In  the  first 
case  and  V-  are  equivalent,  whereas  in  the  second  case,  they  may  be  distinguishable 
only  quantitatively.  Let  us  now  consider  three  classes  of  psychophysical  experiments. 

(I)  An  interesting  possibility  for  distinguishing  the  Laplacian  from  the  directional  second 

derivative  on  the  basis  of  physiological  or  psychophysical  experiments  is  suggested  by  the  r— 

observation  that  the  zero  crossings  of  the  Laplacian  “swing  wide"  of  gray-level  corners.  In 

particular,  the  zero-crossings  associated  with  an  elongated  blac1-  bar,  for  example,  coincide 

for  V-  and  ,  whereas  they  differ  in  the  case  of  a  circular  black  disk.  Hyperacuity 

experiments  may  allow  one  to  distinguish  the  two  cases.  Notice  that  both  operators  are 

- 

linear  in  this  case.  They  associate  therefore  with  Gaussian  convolution  ((!  =  cJi»).  The 
corresponding  point  spread  functions  are 

(a)  for  the  one  dimensional,  f[x): 


(II)  for  rhe  two  dimensional  /(/>) 


where  .>  is  the  standard  deviation  of  the  Gaussian  function.  Let  us  call  w  the  diameter  of 
the  central  region  of  these  masks,  i.e..  the  distance  between  the  central  zeros.  »’i/>  denotes 
the  diameter  for  the  one -dimensional  case  and  for  the  two-dimensional  case.  It  is  easy 
to  see  that  the  second  directional  derivative  has  whereas  this  is  not  true  for 

the  Laplacian  u  \n  /  From  (a)  and  (b)  we  get 
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A  possible  psychophysical  test  is: 

•If  zero-crossings  in  the  Laplacian  are  used  by  our  visual  system  to  estimate  position  of 
edges,  the  apparent  width  of  a  narrow  1-D  bar  and  of  a  small  circle  (with  equal  physical 
widths)  should  be  different— the  bar  should  appear  smaller.  This  is  not  expected  if  the 
second  directional  derivative  is  used. 

(II)  There  are  classes  of  intensity  edges  that  generate  zeros  in  £'.}  but  not  in  V2.  An 
example  is  given  by: 


/(*.»)  =  (  i  +  [4A9] 

which,  with  appropriate  values  of  ft  does  not  satisy  V2/  =  0  for  any  y  >  o.  It  is  possible, 
however,  to  find  solutions  to  /  — -  0.  Thus,  the  edge  /  could  again  be  used  to  discriminate 
psychofthysically  between  V-  and 

More  in  general,  functions  b  c  C*  in  a  certain  region  I)  such  that  V~h  >  0  in  I)  are  called 
subharmonic,  as  we  discussed  earlier.  These  functions  do  not  have  zero-crossings  of  the 
Laplacian  (Daugman,  1984a),  but  generally  zero-rossings  of  jgl  are  present.  There  are 
special  cases,  however,  in  which  both  ~  and  V2  do  not  have  zeros.  An  example  is  given  by 
J(i)  —  con  x  +  hr-  with  V'-’/  =-  £:  f  =  -  cos2  x  +  2b,  which  does  not  have  any  zero-crossings 
if  b  >  It  would  be  interesting  to  test  this  kind  of  pattern  both  psychophysically  and 
physiologically  (controlling  carefully  for  nonlinearities  in  the  phototransduction). 

Ill)  As  we  mentioned  earlier  in  this  chapter,  harmonic  functions  cannot  be  characterized 
in  terms  of  the  zero-crossings  of  their  Laplacian.  Worse  yet,  any  image  is  characterized 
uniquely  by  zero-crossings  of  the  Laplacian  (across  gaussian  scales,  see  chapter  6) 
modulus  any  harmonic  function.  Psychophysical  experiments  that  measure  the  detectability 
of  edges  in  subharmonic  patterns  are  difficult  to  interpret,  because  they  would  give  a 
clear  answer  only  if  the  Laplacian  were  the  only  differential  operator  in  the  human  visual 
system,  a  very  unlikely  possibility.  Furthermore,  harmonic  functions  are  unstable  against 
small  perturbations,  making  difficult  to  control  for  non-linearities  in  the  display  and  in  the 
transduction  process. 


i-a 

m  ,.J 


5.  Geometrical  properties  of  edge  contours 


In  this  section,  we  will  discuss  geometrical  properties  of  edge  contours  obtained  by  different 
methods.  We  will  show  that  edges  derived  through  rotational  operators  are  generally  smooth, 
closed  curves,  while  edges  obtained  with  directional  operators  do  not  have  such  special 
geometrical  properties. 

In  many  edge  detection  schemes,  as  we  discussed  in  the  Introduction,  the  image  /( j-,  y)  is 
first  filtered  and  then  a  second  order  differential  operator  /)•  is  applied  to  the  filtered  image 
/(r,  y).  Edges  are  identified  in  correspondence  to  the  zero-crossings  of  l>-l(x,y).  In  other 
cases,  edges  are  identified  as  extrema  of  some  derivative  of  the  filtered  image.  Again  they 
may  correspond  to  zero-crossings  of  a  higher  order  derivative.  In  this  way,  the  first  part 
of  edge  detection  provides  a  compact  and  possibly  complete  representation  of  intensity 
changes  (see  chapter  6). 

Therefore,  it  is  important  to  analyze  theoretically  geometrical  properties  of  the  locus  of 
points  defined  by 


!l)  0, 
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where  ?(x,  y)  is  the  filtered  image  and  D'1  can  be  a  RID  or  a  DD  operator.  We  first  recall 
in  the  next  two  sections  the  notions  of  transversality  (Abraham  and  Robbin,  1967,  Poston 
and  Stewart,  1976)  and  of  Morse  functions  (Poston  and  Stewart,  1976).  In  section  5.4  we 
will  classify  the  types  of  zero-crossings  contours  that  can  appear  in  images. 

5.1.  Transversality  and  zero-crossing  (z.c.) 

A  curve  (or  a  surface)  .s',  meets  a  curve  (or  a  surface)  S\  in  /’  transversally  when  the 
tangent  space  7’.s’,  to  S,  in  /'  and  the  tangent  space  TS-,  to  S->  in  /'  have  locally  around  /> 
an  empty  intersection.  More  generally,  two  subspaces  ll,  V  of  9?”  are  transverse  if  they  meet 
in  a  subspace  whose  dimension  is  as  small  as  possible.  From  this  definition,  it  follows  that 
the  surface  .s’,  =  (x,y,f(x,y))  meets  the  surface  Su  =  (z,y,0)  in  P  =  (z,j),0)  transversally  iff 
•n  (i,y) 

|grad/|  ^  0.  [5.2] 

The  isotopy  theorem  (Thom,  1954)  shows  that  transversal  intersections  are  structurally 
stable.  The  converse  is  also  true  in  that  non-transversal  intersections  are  structurally 
unstable.  Transversality  (and  the  implicit  function  theorem)  indicates  that  if  5,  meets  S0 
transversally  in  V  then  the  intersection  of  S,  and  S„  around  P  is  a  smooth  curve. 

The  previous  result  is  only  local.  Globally  we  find  that  if  f(x,  y)  is  defined  in  the  compact 
domain  V  e  II-  whose  boundary  is  6  V  and  if  S/  always  meets  S„  transversally,  then  the 
intersection  of  .S’,  and  S„  consists  of: 

(a)  smooth  closed  curves  l\  €  V. 

(b)  smooth  curves  l\  that  terminate  in  6V. 

In  other  words,  transversality  of  zero-crossings  means  that  zero  crossing  contours  are 
closed  curves  or  curves  that  terminate  at  the  boundary  of  the  image. 

5.2.  Closed  and  open  contours  of  z.c. 

From  tho  previous  section  a  necessary  and  sufficient  condition  for  transversality  in  P  is: 

|grad#*/(x,y)|/>  ^  0.  [5.3] 

A  preliminary  condition  required  by  eq.  [5.3]  is  that  />*/(*,»)  is  a  differentiable  function. 
This  condition  is  obviously  met  if  l(x,  y)  is  analytic  (or  entire  or  band-limited).  But  we  have 
already  stressed  that  it  is  safer  to  suppose  that  the  original  image  l(x,y)  is  a  piece-wise 
continuous  function  or  belongs  to  f  ”,  with  n  not  known  a  priori.  If  we  filter  the  original 
image  l(x,y)  with  an  appropriate  rotational  filter,  then  /(x,  ?,)  is  analytic  both  in  x  and  y, 
and  y)  o  defines  a  differential  function.  On  the  other  hand,  if  we  use  a  directional 
filter  /.  far  example  along  z,  we  have 

/(x,  y)  —  l{x,  y)  *  J(x)  (5.4) 

and  there  is  no  reason  for  /(.r.  y)  to  be  a  three  times  differentiable  function  of  y.  Therefore, 
if  the  original  image  has  been  filtered  with  a  directional  operator  only,  it  is  possible  that 
the  zero-crossings  of  l)-l(x.y)  may  not  be  smooth  curves. 

5.3.  Morse  functions 

A  function  J-.H-  •  »  li  is  called  a  Morse  function  if  at  its  critical  point  (i.e.,  points  where 
grad/  o).  the  Hessian  is  nondegenerate.  Morse  functions  have  the  following  properties: 
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(a)  Suppose  that  f(i,y)  ~  ()  and  |grad/|,,  --  0  with  /’  —  (>,?/),  but  the  Hess  (/),,  is  non 
degenerate.  Thus,  there  is  a  smooth  local  change  of  coordinates  around  /'|grad(x,  ?/),,!, 
such  that  /  takes  the  exact  form 


/(*. y) 


1  a*/ 

2  fix'* 


<r-f 

+  iMyl‘‘TV  + 


t  ft2/ 
2  <)y 


(b)  A  small  enough  perturbation  of  a  Morse  function  /  can  always  be  expressed  in  the 
same  form  as  the  original  /  by  a  change  of  coordinates  and  of  scale. 

Property  (a)  says  that  around  l‘  the  function  /  has  the  behavior  of  the  quadratic  form 
induced  by  the  Hessian.  Property  (b)  is  a  kind  of  structural  stability  pro;*erty  A  basic 
property  of  Morse  functions  is  that  they  are  dense  so  that,  if  /  is  a  non-Morse  function,  then 
an  arbitrary  small  perturbation  of  /  makes  /  a  Morse  function  (obviously  the  perturbation 
must  not  vanish  at  the  critical  points).  This  is  the  reason  of  the  importance  of  Morse 
functions  here:  we  can  always  assume  that  images  are  Morse  functions  (especially  because 
of  the  unavoidable  noise). 


5.4.  Classification  of  z.c. 

We  now  analyze  the  geometrical  properties  of  the  z.c.  contours,  i.e.,  the  locus  of  points 
defined  by 


iyii(x,y)  =  h(x,y)  =  0.  (5.7] 

(a)  If  h(x,  y)  is  not  a  smooth  function  of  x  and  y  (at  least  <?'),  the  implicit  function  theorem 
cannot  be  used  and  the  z.c.  may  be  isolated  points,  i.e.,  segments  of  intersecting  curves 
and  2-D  regions. 

(b)  If  h(x,y)  is  a  smooth  function  of  x  and  y  and  if  in  /’  =  (x,y)  we  have 

h(r,  y)  —  o  and  |gradA(x,  y)|/>  ^  0, 

then  h(x,y)  has  in  V  a  “transversal  zero-crossing",  which  is  a  smooth  curve. 

(c)  If  h(x,  2/)  is  a  smooth  function  and  in  /*  we  have 

h(x,y)  =  0  and  jgrad/^x.y),,]  =  o  [5.8] 

but  around  h(x,  y)  we  find 

h{x,  y)  =  fix2  +  bry  +  ry1  f  ()(xnym)  n  +  m  =  3,  |5.9] 

where  n  —  £/„(,„  6  —  c  =  \lvv\i-  The  zer0  crossing  /*  is: 

(i)  an  elliptic  z.c.,  if  Hess  /*(x,2/)|,,>  0  (see  Fig.  2A). 

(ii)  a  hyperbolic  z.c.,  (saddle  point)  if  Hess  /i(x,y)|,,<  0  (see  Fig.  2B). 

(iii)  a  parabolic  z.c.,  if  Hess  b(x,y)jr—  0  but  a,  b  and  r  are  not  identical  to  zero  (see  Fig. 
2C). 

(d)  If  h(x,y)  is  a  smooth  function  and  if  in  /’  we  have 

/i(x,  i/)  -----  0  |grad/(x,  ?/)| ,,  -----  0, 
and  in  r,h[x,y)  depends  on  the  third  order  terms, 
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h(x,y)  --  os*  +  flx*y  +  i/irj/2  +  rfj/'1  +  T)(:rnym),  n  +  m  =  4,  (5. 10) 

where  he  coefficients  j  and  t>  are  obtained  by  the  Taylor  expansion.  It  is  easy  to  see 
that  the  set  of  points 

Ha  --  {(•*■.  y):'**'1  t-  /f**y  +  a**/2  t  *y3  ~  0}  (•r>  H) 

are  straight  lines.  The  z.c.  lines  may  be: 

(i)  an  elliptic  umbilic,  if  ltA  consists  of  three  lines  (see  Fig.  2D). 

(ii)  a  hyperbolic  umbilic,  if  liA  consists  of  a  single  real  line  (see  Fig.  2E) 

(iii)  a  parabolic  umbilic,  if  UA  consists  of  three  lines,  two  of  which  are  coicident  (see  Fig. 
2F) 

(iv)  a  symbolic  umbilic,  if  HA  consists  of  three  coincident  lines. 

(e)  If  h(.c,  y)  is  a  smooth  function  and  in  /’  we  have 


/i(z,  y)  =  0 

and  in  /',  /<( x,  y)  depends  on  the  fourth  order  terms,  the  z.c.  lines  have  a  complex  shape 
that  can  be  analyzed  using  results  of  Poston  &  Stewart  (1976). 


Bifurcations  of  zero-crossings 

The  isotopy  theorem  (Thom,  1954,  Abraham  and  Robbin,  1967)  shows  that  transversal 
intersections  are  structurally  stable,  i.  e.  that  "transversal  zero-crossings"  are  structurally 
stable,  their  topological  properties  do  not  change  if  the  size  of  the  filter  is  slightly  changed. 

If  /(.r,  y)  is  a  Morse  function  then  .s',  may  meet  .S'„  non-transverally,  and  these  intersections 
are  not  structurally  stable  (observe  that  Morse  functions  are  structurally  stable  but  not  their 
intersections  with  ,s'„).  If  /  is  a  Morse  function,  then  .S',  may  meet  .s’„  non-transversally  at 
elliptic  points  and  hyperbolic  points.  These  intersections  are  not  structurally  stable  and  may 
change  heir  topological  properties  for  small  perturbations  of  /.  More  specifically  we  may 
have  two  bifurcations: 

(0  Elliptic  z.c.  At  elliptic  z.c.,  a  small  perturbation  of  f  may  lead  to  the  disappearance  of 
the  z.c.  or  to  the  appearance  of  a  contour  of  z.c.  constituted  by  a  closed  curve. 

tii)  hypeibolic  z.c.  At  hyperbolic  z.c.,  which  consists  of  the  intersection  of  two  curves  any 
small  perturbation  leads  to  the  breaking  of  the  intersection  of  the  two  curves  and  the 
appearance  of  two  disjoint  curves. 

These  are  the  two  bifurcations  that  may  appear  when  h(x,  y)  is  a  Morse  function.  Interestingly 
enough,  the  zero- crossing  contours  obtained  with  real  images  (which  will  be  explored  in  a 
later  paper)  can  be  classified  as  type  (b)  and  (c)  of  the  previous  section;  Morse  functions 
can  have  z.c.  only  of  type  (b)  and  (c).  The  two  types  of  bifurcation,  that  may  originate  with 
Morse  functions  are  illustrated  in  Fig.  3A  and  3B.  respectively  (see  also  Koenderink  and 
van  Doom.  1979).  Yuille  and  Poggio  (1983a.  1983b)  have  shown  that  (if  Gaussian  filtering 
is  used)  when  the  scale  of  the  filter  is  changed  (i.e.  n).  the  second  type  of  bifurcation 
may  appear  either  when  n  is  increased  or  decreased,  but  the  first  type  of  bifurcation  only 
occurs  when  n  is  increased.  Thus,  the  Gaussian  filter  forbids  creation  of  a  zero  crossing 
contour  from  on  elliptic  z.c.  for  increasing  o.  It  is  important  to  note  that  all  these  topological 
properties  are  also  valid  for  level-crossings.  Thus  setting  a  threshold  in  the  output  of  the 
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Figure  3  The  zero  crossing  points  may  be  of  the  elliptic  (A).  hyperbolic  ( B ),  parabolic 
(C)  type ;  the  zero  crossing  lines  can  also  be  an  elliptic  umbilic  (D),  a  hyperbolic  umbilic 
(E)  or  a  parabolic  umbilic  (F).  See  text. 

filtering  and  derivative  operation  preserves  all  topological  and  geometrical  properties  of 
zero-crossings. 

In  summary,  we  have  characterized  the  geometrical  properties  of  zero  crossing  contours: 
these  properties  —  lor  instance  the  fact  that  zero  crossing  contours  are  closed  -  may  be 
exploited  in  various  ways  in  edge  detection  and  even  in  stereo-  or  motion  matching. 
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Figure  4  The  two  types  of  bifurcations  that  can  occur  for  increasing  (left  to  right )  and 
decreasing  (right  to  left)  n  in  the  case  of  Morse  functions.  See  text. 


6.  EDGE  CONTOURS  AND  FILTER  SCALE 


As  we  have  seen,  differential  operations  on  sampled  images  require  the  image  to  first  be 
smoothed  by  filtering.  The  filtering  operation  introduces  an  arbitrary  parameter  -  the  scale 
of  the  filter,  eg.,  the  standard  deviation  for  the  Gaussian  filter.  In  computer  vision,  the 
advantages  of  using  several  scales  of  filtering  was  realized  quite  early  on,  and  this  was 
supported  by  evidence  suggesting  the  presence  of  filters  of  several  sizes  in  the  human  visual 
system  (Rosenleld,  1982:  Marr,  1976;  Marr  and  Poggio,  1979;  Marr  and  Hildreth.  1980)  More 
recently.  Witkin  (1983;  see  also  Stansfield.  1980)  introduced  a  scale  space  description  of 
zero  crossings  which  gives  the  position  of  the  zero  crossing  across  a  continuum  of  scales, 
i.e..  sizes  of  the  Gaussian  filter  (parametrized  by  the  c  of  the  Gaussian)  The  signal — or  the 
result  of  applying  to  the  signal  a  linear  (diffeiential)  operator — is  convolved  with  a  Gaussian 
filter  over  a  continuum  ol  sizes  ol  the  filter.  Zero  or  level  crossings  ol  the  filtered  signal 
are  contours  on  the  r  n  plane  and  surfaces  in  the  s,i/,n  space.  Wilkin  proposed  that 
this  concise  map  can  be  effectively  used  to  obtain  a  rich  and  qualitative  description  of  the 
signal  Yuille  and  Poggio  (1983a.  1903b)  —  who  called  the  maps  of  zero  crossings  across 
scales  hnner prints  —  have  established  interesting  relationships  between  multiresolution 
analysis,  the  Gaussian  filter  and  zero-crossings  ol  filtered  signals  Their  main  results  are 


Torre,  Poggio 


On  Edge  Detection 


two  theorems: 

(a)  zero-  and  level-crossings  of  an  ‘mage  filtered  through  a  Gaussian  filter  have  nice  scaling 
properties,  i.e.,  a  simple  behavior  of  zero-crossings  across  scales.  Zero-crossings  are  not 
created  as  the  scale  increases.  The  Gaussian  filter  is  the  only  filter  that  has  this  nice  scaling 
behavior  (see  also  Babaud.  Witkin  and  Duda,  1983). 

(b)  The  map  of  the  zero-crossings  across  scales  determines  the  filtered  signal  uniquely 
for  almost  all  signals  in  the  absence  of  noise.  The  scale  map  obtained  by  Gaussian  filters 
is  thus  a  complete  representation  of  the  image.  This  result  applies  to  level-crossings  of 
any  arbitrary  linear  differential  operator  of  the  Gaussian  (modulus  the  null  space  of  the 
differential  operator  and  provided  there  are  at  least  two  zero-crossing  contours),  since  it 
applies  to  functions  that  obey  the  diffusion  equation. 

The  first  result  sheds  some  light  on  the  properties  of  zero-crossings  and  level- crossings 
at  different  scales  with  the  Gaussian  filter.  It  supports  the  use  of  the  Gaussian  filter  in 
a  multiresolution  edge  detection  scheme.  Reconstruction  of  the  signal  is,  of  course,  not 
the  goal  of  early  signal  processing.  Symbolic  primitives  must  be  extracted  from  the  signals 
and  used  for  later  processing.  The  second  result  implies  that  scale-space  fingerprints  are 
complete  primitives,  that  capture  the  whole  information  in  the  signal  and  characterize  it 
uniquely.  Subsequent  processes  can  therefore  work  on  this  more  compact  representation 
instead  of  the  original  signal  (see  Asada  and  Brady,  1984). 

The  second  theorem  has  theoretical  interest  in  that  it  answers  the  question  of  what 
information  is  conveyed  by  the  edges  identified  with  zero-  and  level-crossings  of  m  altiscale 
Gaussian  filtered  signals.  It  is  furthermore  interesting  that  this  complete  representation 
happens  to  coincide  with  the  basic  scheme  for  edge  detection  discussed  in  this  paper. 
From  this  point  of  view  it  can  be  argued  that  the  fingerprint  representation  makes  explicit 
exactly  the  information  that  is  needed  on  physical  grounds,  i.e.,  it  makes  explicit  edges  in 
the  image. 

It  may  be  asked  at  this  point  what  the  right  sequence  is  for  the  two  steps  of  differentiation 
and  filtering  For  linear  operators  the  order  is  of  course  immaterial,  since  they  commute, 
it  is  not  so  for  nonlinear  operators,  such  as  the  directional  derivative  along  the  gradient. 
The  regularization  argument  for  the  filtering  step  implies  that  filtering  at  one  scale  must 
precede  the  differentiation  operation.  The  computation  of  different  scales  requires  filtering 
at  a  range  of  resolutions  after  differentiation.  The  reason  is  that  the  theorems  of  Yuille  and 
Poggio  (1983a.  1963b)  hold  true  even  for  the  identity  operator,  but  are  not  necessarily  valid 
if  filtering  is  performed  before  a  nonlinear  differential  operation.  In  particular,  Gaussian 
scaling  after  the  nonlinear  directional  derivative  along  the  gradient  does  not  have  a  nice 
scaling  behavior  Thus  filtering  as  a  regularizing  operator  must  be  performed  first  at  one 
scale  and  filtering  at  different  scales  must  be  performed  after  the  differential  operation.  For 
linear  differential  operators  this  is  equivalent  to  a  multiscale  filtering  either  before,  after,  or 
together  with  the  differential  operation  (e  g  the  Laplacian  of  the  Gaussian). 


7.  OVERVIEW  OF  SOME  EDGE  DETECTORS 

In  this  section,  we  will  briefly  compare  our  main  conclusions  with  several  edge  detectors 
presented  in  the  literature  Our  review  is  neither  intended  to  be  exhaustive  nor  does  it  aim 
to  present  edge  detectors  in  full  detail 

7.1.  Difference  of  boxes  (DOB) 

Binford  and  coworkers  (Herskovitz  and  Bmford.  1970:  Horn.  1972;  Binford,  1981)  have 
suggested  the  use  of  support  timit-Kl  filters  in  the  filtering  stop  of  edge  detection.  They  have 
used  the  Haar  function  |  t  fdij  in  directional  filtering  or  a  difference  of  (unctions  of  the  type 
|3  8a|  for  rotational  filtering  I  here  are  two  problems  using  this  approach: 
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(1)  Filtering  with  support-limited  functions  does  not  regularize  the  image  intensity  profile; 
therefore  the  use  of  any  differential  operator  is  unsafe. 

(2)  A  strictly  support-limited  filter,  such  as  a  DOB,  cannot  be  correctly  sampled,  and  it  is 
very  difficult  to  obtain  a  good  digital  representation. 

7.2.  Slianmugam,  Dickey  and  Green 

Shanm  jgam,  Dickey  and  Green  (1979)  looked  for  a  linear,  band-limited  operator  that  would 
yield  maximal  output  energy  within  a  given  spatial  interval  in  the  vicinity  of  the  edge.  No 
explicit  reference  was  made  to  a  differentiation  step  in  edge  detection.  They  proposed  that 
the  optimal  filter  for  an  ideal  edge  S(x),  has  a  Fourier  transform 
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where  t,  is  a  constant,  Vi(*.  <■)  is  a  linear  prolate  function  (see  section  3.1).  This  edge 
detector  performs  very  poorly  on  localization  and  has  the  intrinsic  feature  of  giving  two 
maxima  of  energy  in  the  output  of  the  response  to  an  edge.  The  reason  is  simply  that 
using  an  even  filter  such  as  eq.  [7.1]  which  has  the  same  shape  of  edges  are 

located  at  the  zero-crossing  of  the  output  and  not  at  the  extrema.  Moreover,  these  authors 
use  properties  of  linear  prolate  functions  (their  eq.  1)  to  derive  their  optimal  filter  which  are 
valid  in  1-D,  but  not  in  2-D  when  linear  prolate  functions  are  extended  in  2-D  by  rotation. 
In  addition,  their  asymptotic  approximation  to  the  optimal  filter  was  incorrect,  as  shown  by 
Lunscher  (1983). 


7.3.  Marr  and  Hildreth 

Marr  &  Hildreth  (1980),  and  Hildreth  (1980),  extending  the  work  of  Marr  and  Poggio  (1979), 
have  proposed  an  edge  detection  scheme  based  on  a  filtering  step  consisting  of  a  2-D 
symmetric  Gaussian  followed  by  the  localization  of  zero-crossings  of  V*?(*,y),  where  /(x,y) 
is  the  filtered  image.  This  edge  detect or  performs  rather  well,  but  its  optimality  was  not 
rigorously  proved.  Indeed, 

(1)  —  in  many  instances  achieves  a  better  localization  than  V2,  particularly  for  rounded 
edges  with  large  curvature. 

(2)  The  use  of  directional  filters  and  directional  derivatives  when  performed  correctly  does 
not  give  rise  to  the  problems  that  forced  Marr  and  Hildreth  to  reject  such  edge  detection 
schemes  (see  section  4.1).  The  use  of  two  directional  filters  with  directional  derivatives  may 
be  as  efficient  as  the  Marr-Hildreth  scheme,  with  the  advantage  of  not  introducing  spurious 
edges  that  appear  with  rotational  filtering  because  of  the  closure  property  of  z.c.  contours 
(see  section  5). 


7.4.  Haralick 

Haralick  (1980.  1981.  1982)  has  proposed  a  scheme  for  edge  detection  in  which  a  pixel  is 
marked  as  a  step  edge  pixel  if,  in  its  neighborhood,  there  is  a  zero- crossing  of  the  second 
directional  derivative  taken  in  the  direction  of  the  gradient.  Haralick,  in  order  to  evaluate 
the  derivatives  he  approximates,  interpolates  the  sampled  intensity  values  with  discrete 
Chebychev  polynomials.  There  is  no  explicit  mention  of  a  filtering  step.  Canny  (1983), 
however,  has  shown  that  the  above  procedure  is  practically  equivalent  to  using  a  filtering 
step  (in  our  terms,  a  regularization  step)  before  differentiation.  The  type  of  equivalent  filter 
depends  on  the  set  of  approximating  functions  and  on  the  degree  of  differentiation  required. 

7.5.  Canny 

Canny  (1903)  has  investigated  the  desirable  properties  of  an  optimal  edge  detector,  based 
on  efficiency  of  detection  and  reliability  in  localization.  We  have  already  seen  that  detection 
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of  an  ideal  step  edge  is  favored  by  broad  filters  while  localization  is  favored  by  small  filters. 
Canny  has  shown  through  variational  methods  that  the  optimal  odd  filter  /„p(x)  (according 
to  his  criteria)  in  the  1-D  case  is  the  linear  combination  of  four  exponentials. 

_  ,3 

Interestingly  /op(x)  is  very  close  to  *r';rt  which  is  the  optimal  odd  filter  from  the  point  of 
view  of  minimal  uncertainty.  The  treatment  of  Canny  may  also  be  seen  as  a  well-founded 
justification  for  the  use  of  filters  with  minimal  uncertainty,  because  simply  by  first  changing 
some  constraints  in  his  variational  approach  it  is  possible  to  obtain  the  second  Hermite 
function.-’  Canny’s  procedure  for  finding  two-dimensional  step  edges  and  other  types  of 
edges  uses  directional  operators  of  varying  width,  length  and  orientation.  This  procedure, 
which  includes  as  an  essential  part  an  appropriate  thresholding,  works  remarkably  well 
on  real  images.  His  justification  of  the  choice  of  directional  operators  is  not  completely 
satisfactory.  Indeed: 

(1)  For  2-D  images,  Canny  uses  two  alternative  differential  operators,  either  ^  (see  section 
4.2  and  Havens  and  Strickwerda,  personal  communication)  or  directional  operators.  The 
preference  for  directional  operators  originates  from  his  one-dimensional  treatment  of  the 
problem.  The  optimal  filter  is  chosen  to  be  an  antisymmetric  function,  because  it  is  designed 
to  detect  maxima.  Therefore  the  corresponding  2-D  operator  is  not  rotationally  invariant, 
suggesting  the  use  of  directional  operators  for  2-D  images.  The  output  of  directional 
operators  can  be  directly  used  in  the  adaptive  threshold  scheme  used  by  Canny,  offering 
advantages  with  respect  to  the  symmetric  operator 

(2)  As  already  mentioned  in  section  4.1,  to  obtain  all  edges  in  a  2-D  image  it  is  sufficient 
to  use  only  two  different  directional  derivatives.  The  use  of  more  than  two  orientations  is 
useful  only  to  increase  the  signal-to-noise  ratio,  but  is  not  required  for  edge  detection  in  a 
noise-free  2-D  image. 


8.  DISCUSSION 

We  will  now  summarize  the  main  points  of  our  analysis  of  edge  detection. 

A.  The  first  step  in  edge  detection,  after  sampling  of  the  image,  consists  of  a  filtering 
stage  followed  by  a  differentiation  stage.  Filtering  has  the  main  function  of  regularizing 
the  ill-posed  nature  of  edge-detection  and  should  be  performed  before  the  differentiation 
operation.  Filtering  for  the  purpose  of  multiresolution  analysis  should  be  performed  after 
the  differentiation  operation,  when  nonlinear  differential  operators  are  used. 

D.  To  be  physically  realizable,  digital  filters  should  be  represented  with  a  good  approximation 
by  a  finite  sequence  of  samples  of  points.  From  this  point  of  view,  a  Gaussian  or  the  first 
linear  prolate  (4>„(i,f))  function  are  practically  equivalent.  Filtering  with  prolate  functions 
regularize  "more"  the  image  (the  image  becomes  entire  and  band-limited),  whereas  using 
a  Gaussian  the  image  becomes  only  entire.  The  Gaussian  filtering,  however,  has  two 
advantages  over  prolate  functions: 

(i)  It  does  not  create  z.c.  when  the  size  of  the  filter  is  increased  (see  section  6). 

(ii)  In  2-D,  the  Gaussian  decomposes  into  the  product  of  1-D  Gaussians:  as  a 
consequence,  it  is  particularly  easy  to  reduce  drastically  the  amount  of  computations 
involved  in  its  use. 

C.  Filtering  of  the  image  with  a  rotationally  symmetric  filter  insures  with  high  probability  the 
closure  of  z.c,  contours  (see  section  5).  Filtering  the  image  with  directional  filters  does  not 
ensure  closed  z.c.  contours.  Localization,  however,  is  more  accurate. 

D.  Several  types  of  derivatives  at  different  scales  may  be  needed  for  detecting  and  labeling 
intensity  changes  under  the  most  general  conditions.  In  the  differentiation  step,  directional 

Recently  Spacek  and  Brady  have  investigated  split  gaussian  tillers  similar  to  Canny's  but  with 
poorer  signal -to -noise  ratio  and  holler  localization. 
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derivatives  in  only  two  directions  are  necessary,  when  DD  operators  are  used.  When  RIO 
operators  are  used,  ~-t  performs  better  then  V2  in  localization,  but  — ;  has  the  disadvantage 
of  not  commuting  with  the  convolution. 

E.  In  order  to  characterize  the  types  of  intensity  changes  in  the  image  in  terms  of  the 
physical  properties  that  have  originated  them,  it  is  useful  to  have  a  set  of  hierarchical 
symbolic  descriptions.  The  lowest  symbolic  description  uses  as  a  substrate  the  associated 
fingerprints  of  the  image,  containing  the  map  of  zero  crossings  and  their  slope  at  different 
scales,  and  provides  a  local  labeling  of  edges  still  in  terms  of  image  data.  The  final  symbolic 
description  must  label  edges  in  terms  of  the  properties  of  the  physical  surfaces  that  originate 
the  intensity  changes,  and  therefore  as  object  boundaries,  shadows,  reflexes,  changes  in 
texture,  specular  reflections,  etc.  This  final  representation  of  the  type  of  the  primal  sketch 
is  obtained  using  high  level  knowledge  and  geometrical  reasoning  from  lower  symbolic 
descriptions. 

In  later  papers,  we  will  evaluate  performance  of  different  filters  and  different  operators  in 
real  images,  and  we  will  outline  a  theory  of  a  symbolic  description  of  edges. 

Acknowledgement:  We  are  grateful  to  A.  Yuille,  A.  Verri,  M.  Kass  for  useful  discussions  and 
suggestions.  M.  Bertero  first  pointed  out  to  us  that  differentiation  is  an  ill-posed  problem. 
M.  Brady,  J.  Canny,  E.  Grimson,  M.  Kass,  H.  Voorhees,  W.  Richards  and  especially  E. 
Hildreth  read  the  manuscript  and  provided  tremendously  useful  and  poorly  implemented 
suggestions.  Carol  Bonomo  typed  the  math,  edited  the  English,  and  generally  managed  to 
look  quite  busy,  even  if  she  wasn't. 

9.  References 

Abraham,  R.  and  Robbin,  J.  Transversal  mappings  and  flows.  W.  A.  Benjamin,  Inc.,  New 
York,  1967. 

Achieser,  N.  I.  Theory  of  Approximation.  Frederick  Ungar  Publishing  Co.,  New  York,  1956. 

Ahlberg.  J.  H„  Nilson,  E.  H.  and  Walsh,  J.  L.  The  Theory  of  Splines  and  their  Applications, 
(Vol.  38  in:  MATHEMATICS  IN  SCIENCE  AND  ENGINEERING),  Academic  Press,  New  York, 
1967. 

Asada  H.  and  Brady,  M.  The  curvature  primal  sketch.  A.I.  Memo  758,  MIT,  1984. 

Ballard,  D.  H.  and  Brown,  C.  Computer  Vision,  Prentice-Hall,  Englewood  Cliffs,  New  Jersey, 
1982. 

Babaud,  J.,  Witkin,  A.  and  Duda,  R.,  "Uniqueness  of  the  Gaussian  kernel  for  scale-space 
filtering,"  Fairchild  TR  645,  Flair  22,  1983. 

Berlins,  V.  "  Accuracy  of  Laplacian  edge  detectors."  Computer  Graphics  and  Image 
Processing,  27,  195-210,  1984. 

Binford,  T.  O.  "  Inferring  surfaces  from*  images."  Art.  Int.,  17,  205-244,  1981. 

Binford,  T.  O.  “  Survey  of  model-based  image  analysis  systems."  Int.  J.  Robotics  Res.,  1, 
no.  1,  18  64,  1982. 

Boas.  R.  P.  Entire  functions.  Academic  Press,  New  York,  1954. 

Borsellino,  A.,  Poggio,  T.  "Correlation  and  Convolution  algebras”,  Kybernetik,  13,  113-122, 
1973. 

Brady.  J  M.  "  Computational  Approaches  to  Image  Understanding"  Computing  Surveys. 
14.3  71.1932. 

Canny,  J  F  finding  edges  and  lines.  MIT  Technical  Report  No.  720,  1983. 


.  •  i 


29 


*.  s 


Torre,  Poggio 


On  Edge  Detection 


Davis,  L.  "A  survey  of  edge  detection  techniques."  Computer  Graphics  and  Image 
Processing,  4,  248-270,  1975a. 

Davis,  P.  J.  Interpolation  and  Approximation.  Dover  Publications,  Inc.,  New  York,  1975b. 

Daugman,  J.  G.  "Six  formal  properties  of  two-dimensional  anistropic  visual  filters:  Structural 
principles  and  frequency/orientation  selectivity."  IEEE  Systems,  Man  and  Cybernetics,  in 
press,  1984. 

Daugman,  J.  G.  "Uncertainty  relation  for  resolution  in  space,  spatial  frequency  and 
orientation  optimized  by  two-dimensional  visual  cortical  filters."  J.  Opt.  Soc.  Am,  in  press, 
1984a. 

Frieden,  B.  R.  "Linear  and  circular  prolate  functions",  in  Progress  in  Optics  IX,  ed.  E. 
Wolf,  North-Holland  Publishing  Co.,  Amsterdam,  312-408,  1971. 

Greville,  T.  N.  E.  "Introduction  to  spline  functions",  in  Theory  and  Applications  of  Spline 
Functions,  ed.  T.  N.  E.  Greville,  Academic  Press,  New  York  -  London,  1-36,  1969. 

Haralick,  R.  M.  "Edge  and  region  analysis  for  digital  image  data.”  Compt.  Graphics  &  Image 
Proc,  12,  60-73,  1980. 

Haralick,  R.  M.  "The  digital  edge."  Proc.  1981  Coni,  on  Pattern  Recognition  &  Image 
Processing,  Dallas,  Texas,  285-294,  1981. 

Haralick,  R.M.  "Zero-crossings  of  second  directional  derivative  edge  operator.”  SPtE  Proc. 
on  Robot  Vision,  Arlington  Virginia,  1982. 

Hermuth,  H.  H.  Transmission  of  information  by  orthogonal  functions.  Springer- Verlag,  Berlin, 
1972. 

Herskovitz,  A.  and  Binford,  T.  O.  "On  boundary  detection."  A.  I.  Memo  183,  MIT,  1980. 
Hildreth,  E.  C.  "Implementation  of  a  theory  of  edge  detection."  A.  I.  Memo  579,  MIT,  1980. 
Horn,  B.  K.  P.  "The  Binford-Horn  edge  finder."  A.  I.  Memo  285,  MIT,  1972. 

Koenderink,  J.J.  and  van  Doom,  A.  J.  "The  structure  of  two-dimensional  scalar  fields  with 
applications  to  vision.  "  Biol.  Cyb.,  1979. 

Kolmogorov,  A.  and  Fomine,  S.  Elements  de  la  theorie  des  functions  et  de  I' analyse 
fonctionelle.  Editions  MIR,  Moscov,  1974. 

Landau,  H.  J.  and  Pollack,  H.  O.  "Prolate  spherical  wave  functions,  Fourier  analysis  and 
uncertainty-ll."  Bell  Syst.  Tech.  J.,  40,  65-84,  1961. 

Lipschutz,  M.  M.  Differential  geometry.  McGraw-Hill,  New  York,  1969. 

Lowe,  D.  G.  and  Binford,  T.  O.  "Interpretation  of  geometric  structure  from  image  boundaries." 
SPIE,  281,  224-231,  1981. 

Lunscher,  W.H.H.  "The  asymptotic  Optimal  Frequency  Domain  Filter  for  Edge  Detection." 
IEEE  Trans.  PAMI,  6,  678-680,  1983. 

Marr,  D.  C.  and  Hildreth,  E.  C,  "Theory  of  edge  detection."  Proc.  R.  Soc.  Lond.  B,  207, 
187-217,  1980. 

Marr,  D.,  Poggio,  T.  "A  computational  theory  of  human  stereo  vision."  Proc.  R.  Soc.  Lond. 
B,  204,  301-328,  1979. 

Oppenheim.  A.  V.  and  Johnson,  D.  H.  "Discrete  representation  of  signals."  Proc.  IEEE,  60, 
no.  6,  681-691,  1972. 

Papoulis,  A.  Probability.  Random  Variables  and  Stochastic  Processes,  McGraw-Hill,  New 
York,  1965. 

Papoulis.  A.  The  Fourier  Integral  and  its  Applications,  McGraw-Hill,  New  York,  1962. 
Pavlidis,  T.t  Structured  Pattern  Recognition,  Springer  Verlag,  New  York,  1977. 


30 


Torre,  Poggio 


On  Edge  Detection 


Poggio,  T.,  Torre,  V.  “Ill-Posed  Problems  and  Variational  Principles  in  Vision",  A.  I.  Memo 
773,  MIT,  1984. 

Poggio,  T.,  Voorhees,  H.  and  Yuille,  A.  "A  regularized  solution  to  edge  detection",  in 
preparation,  1984 

Pogorelov,  A.  V.  Differential  geometry.  P.  Moozdhoff  N.  V.  Groningen,  1965. 

Poston,  T.  and  Stewart,  I.  N.  Taylor  expansions  and  catastrophes.  Pitman  Publishing, 
London,  1976. 

Rosenfeld,  A.  "Quadtrees  and  Pyramids:  hierarchical  representation  of  images"  TR  1171, 
University  of  Maryland,  1982 

Rosenfeld,  A.  and  Kak,  A.  C.  Digital  Picture  Processing,  second  edition,  Academic  Press, 
New  York,  1982. 

Schoenberg,  I.  J.,  “Contributions  to  the  problem  of  approximation  of  equidistant  data  by 
analytic  functions."  Quart.  Appl.  Math.,  4,  45-99,  112-141,  1946. 

Schoenberg,  I.  J.,  “Spline  functions  and  the  problem  of  graduation."  Proc.  nat.  Acad.  Sci. 
USA,  52,  947-950,  1964. 

Shanmugam,  K.  F.,  Dickey,  F.  M.  and  Green,  J.  A.  "An  optimal  frequency  domain  filter 
for  edge  detection  in  digital  pictures."  IEEE  Trans.  Pattern  Anal.  &  Mach.  Intell.,  PAMI-1, 
37-49,  1979. 

Slepian,  D.  "Some  asymptotic  expansions  for  prolate  spheroidal  wave  functions."  J.  Math 
&  Phys.,  44,  99-149,  1965. 

Stansfield,  J.  L.  "Conclusions  from  the  commodity  expert  project",  A.  I.  Memo  601,  MIT, 
1980. 

Thom,  R.  "Quelques  proprietes  globales  des  varietes  differentiables."  Comm.  Math.  Helv., 
28,  17  86,  1954. 

Tikhonov,  A.  N.  “Regularization  of  incorrectly-posed  problems."  Soviet  Math.  Dokl.,  4, 
1624-1627,  1963. 

Tikhonov,  A.  N.,  Arsenin,  V.  Y.  Solution  of  Ill-Posed  Problems,  Winston  and  Wiley  Publishers, 
Washington,  1977. 

Widder,  D.V.  The  heat  equation  ,  Acad.  Press,  New  York,  1975. 

Witkin,  A.  “Scale-Space  Filtering",  Proceedings  of  IJCAI,  1019-1021,  Karlsruhe,  1983. 

Yuille,  A.L.  and  Poggio,  T.  “Scaling  Theorems  for  Zero-crossings".  A.  I.  Memo  722,  MIT, 
1983a. 

Yuille,  Poggio,  “Fingerprints  Theorems  for  zero-crossings”,  A.  I.  Memo  730,  MIT,  1983b. 


1.  Appendix:  Differentiation  through  Taylor  expansion 


In  previous  sections,  we  have  seen  that  to  safely  perform  differentiation,  it  is  necessary  to 
smooth  the  data  by  some  appropriate  (analog  or  digital)  filtering.  If  this  filtering  has  removed 
enough  high  frequencies,  so  that  our  filtered  image  is  band-limited  function,  and  by  the 
Paley-Wiener  theorem  (Boas,  1954)  is  also  an  entire  function  (see  Section  3),  numerical 
differentiation  can  be  performed  in  a  computationally  more  efficient  way  through  Taylor 
expansion. 

If  /(x)  is  entire,  than  /(x)  is  also  analytic  and  the  Taylor  series 

/(x)  =  /* + (x  -  xon  +  •  •  ■  +  ^4r')/*‘)  +  •  •  •  l'l 

has  an  infinite  radius  of  convergence.  If  we  have  2»  +  I  sampled  points  from  [2],  we  can 

obtain  2n  linear  equations  from  which  we  can  solve  for  / jk,),  j  =  1,2 . jin. 

Three  equidistant  points  give 

I'k  -  ,j-(A+i  -/*-.) 

t  [3] 

f"k  —  I~i(fk-l  -  2/a  +  /*  +  ,) 

With  five  equidistant  points,  we  obtain 

fk  —  T5r(/k-2  -  */*-•  +  _  A-*) 

Ul  W 

K  -  |2^5 (“A-*  +  I6A-.  -  30/a  +  16/*+  ■  -  /*+2) 

When  the  performances  of  the  numerical  differentiation  obtained  through  spline  interpolation 
(eqs.  2.9  •  2.10)  are  compared  with  those  obtained  by  Taylor  expansion  (equations  3-4  in 
this  appendix),  it  turns  out  that  the  first  method  gives  more  accurate  and  consistent  results 
with  noisy  data  while  the  second  method  is  more  efficient  with  data  that  are  already  smooth. 
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2.  Appendix:  Sampling 


Since  image  processing  is  performed  in  terms  of  discrete  representations  of  signals  and 
filters,  it  is  important  that  manipulations  of  sampled  images  and  filters  have  a  meaningful 
connection  with  the  original  image  /(x)  and  the  analytic  form  of  the  filter  /(x).  More  precisely, 
for  linear  filtering,  if  /,  is  a  discrete  sequence  of  points  of  /(z),  and  /,  is  another  discrete 
sequence  of  points  of  /(z),  the  discrete  convolution 

ffi  — '  4  •  Ik-i  1*1 

k 


should  be  related  to  the  exact  convolution 

g(T)  =  J  l(y)f(y  -  x)dy  =  /(*)  *  f(x).  [2] 

This  relation  is  clarified  by  standard  results  (Oppenheim  and  Johnson,  1972;  Borsellino  and 
Poggio,  1973). 

Suppose  that  we  may  represent  /(z)  and  f(x)  as 


I(x)  =  £ 

« 

[3a] 

/(*)  =  DC  f*Mx) 

[36] 

where  <t>i(x)  =  sinc[  j[(z  -  *A)j.  Then 


/(z)  *  /(z)  =  g(z)  =  £g,&(z),  [1] 

t 

where  g,  =  /*/*_,. 

Thus  from  the  discrete  convolution  of  the  sampled  values  [2],  it  is  possible  to  recover 
completely,  the  exact  convolution  of  the  original  image  with  the  filter.  It  is  now  possible  to 
represent  a  signal  In  the  form  of  equation  [1]  when  the  signal  is  band-limited  and  correctly 
sampled.  If  one  uses  band-limited  filters  it  is  possible  to  obtain  the  required  representation 
[3b]  for  the  filter.  For  an  arbitrarily  sampled  image,  however,  it  is  difficult  to  obtain  the 
required  representation  [3a],  Indeed  it  would  be  necessary  to  sample  the  image  according 
to  the  cutoff  frequency  of  the  optical  system  used  in  the  imaging  process,  which  is  generally 
too  high  to  be  of  practical  use.  This  is  related  to  the  classical  problem  of  aliasing.  The 
simplest  way  to  obtain  a  reasonable  solution  to  the  problem  is  to  initially  filter  the  image 
with  an  appropriate  band -limited  filter,  before  any  further  operation. 
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3.  Appendix:  The  geodesic  curvature 


tt  may  be  of  some  interest  to  ask  whether  the  iine  on  the  surface  X  -  (x,  *,/(*,»)).  whose 
tangent  is  always  in  the  direction  of  the  gradient  is  also  a  geodesic.  A  geodesic  is  aline 
whose  geodesic  curvature  K„  is  always  zero.  In  what  follows  we  will  briefly  answer  this 
question  The  surface  -V  has  the  first  fundamental  form  [4.6], 

As  shown  later,  the  geodesic  curvature  of  a  curve  on  surface  X  whose  tangent  is  always 
in  the  direction  of  the  gradient  is 

“  *)  +  7*V**-f**)]  *  m 

K»  =  Ti/i"  1 

{*+£[•  +/IUI  +  /J)lj 

Now,  Ka  =  0,  when  /*  =  fl  and  fxx  =  Jyv.  that  is,  for  surface  of  revolution.  Therefore  the 
line  on  surface  A’,  which  is  always  directed  along  the  gradient,  has  the  following  properties. 

(1)  its  normal  curvature  Kn  is  equal  to  ^  X 

(2)  it  is  a  geodesic,  only  for  surface  of  resolution. 

Let  us  now  compute  K,.  Given  that  the  coefficients  of  the  first  fundamental  form  are 


E  =  \  +  fl  F  —  Jzft  G  =  l  +  /J, 
The  associated  Cristoffel  symbols  I'j,  I'?,  satisfy  the  equations 

ri.w'  +  'V1  = 

r| ,nr  +  V3UEG  =  Fz E -  X-EEy. 

or 

i(FEz  +  EEy)  -  FZE 
1  n  -  f’*  _  kg 

_  i  k,  _  -  fe,»)  - 

"  2  E  F--f- 

Finally, 


[2] 

pi 


w 


ir»] 


,  .](/,/» 2/x/x,  -t  (I  4  f:)VJry)  -  (I  +  fpUJy  +U'v) 

'** -  nn-  ^n  -n-nn  m 

Jifuf'i  “  (|  ♦  f',)fnfy  _  /»//» 

:V* 

I  I  Ej  .,J  /■'  fifit  July  f if V 

"  ‘  2  e  "  E  i  i  f  'i  r  1  1  f  i 

fifn{t  t  /;  i  /;)  fixf‘yIi_  |7j 

(l  i  fi)>r  . 

I  if,  i 
>r 
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Now 


r'V  -  2/V  +  Ay'  -  M  =-'f  (‘  -  7f  )  +  jMw  -  /--) 


+  “57-  /./„  +  2/,/„&  +  /,/„,§ 


gtJ'Y*'™  '  J  '  JXJVV  yl 

~  +  2/yflvf  +  fvfvv'Ji 

=  7f(/f _  ’) +  n[Jvv  ~ fxx) + ?{0}' 


y/KG  -  F*  =  g 


and 


AV*  +  2  Fx'y'  +  G'l  =  (l  +  /’)  +  2/./M  +  (l  +  fl)y- 
=  1  4-  S\  +  2/J  + 

=  ^{/J,  +  /i+2/*/?  +  i+/i} 

=  j*{/U' + /i) +/*  +  \/un + /?) + 1>. 

Finally,  we  have 

g  [7^(7 f  _  0  +  ~ 

*'  =  {-frifUi  +  n  +  fD+'  +  n/i  +  fi}}*'2 

9[it{ffi  -  0+ iw»v- 1**) 

=  Tr+^{i+ /j(/T+  /i)};,/7r 

The  geodesic  curvature  A',  is  zero  when 

Eq.  [20]  is  not  generally  satisfied;  it  is  satisfied  for  surfaces  such  that 

/J  «  /; 

/»»  //» 

as  a  sphere  or  a  surface  of  revolution. 


(I7j 


[18] 


[19] 


[20] 


(M) 
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4.  Apsendix:  Approximation  and  interpolation 


The  traditional  procedure  for  performing  numerical  differentiation  of  sampled  functions 
is  to  interpolate  using  polynomials  or  related  functions  and  to  analytically  differentiate 
the  interpolated  function.  This  procedure  is  usually  justified  if  the  interpolated  function 
converges  uniformly  to  the  original  function  as  the  number  of  samples  increases.  Most 
of  the  classical  results  in  the  interpolation  and  approximation  of  functions  deal  with  this 
problem.  In  the  case  of  images  the  main  problem  is  however  different:  approximation  for 
the  puraose  of  differentiation  has  to  be  robust  against  noise.  The  solution  to  this  problem 
has  to  be  sought  in  regularization  theory,  as  we  explained  earlier.  In  this  appendix,  we 
discuss  some  of  the  classical  results  on  interpolation  and  approximation  for  completeness 
and  no  because  they  are  directly  relevant  to  the  problem  of  regularizing  edged  detection. 
The  reader  will  notice,  however,  that  there  are  several  connections  between  the  classical 
results  outlined  here  and  our  approach  described  in  the  the  main  body  of  this  paper. 

Uniforrr  convergence  for  polynomial  interpolation  is  not  guaranteed  even  when  the  original 
function  is  Cx ,  or  when  it  is  analytical.  Uniform  convergence  on  bounded  sets  requires  the 
original  function  to  be  entire.  From  the  Paley-Wiener  theorem  (Boas,  1954)  we  know  that 
this  is  tie  case  with  band-limited  functions. 

If  the  original  function  is  analytic,  numerical  differentiation  may  be  performed  using  an 
appropiiate  Taylor  expansion  without  interpolation.  Interestingly,  almost  every  function  is 
made  analytic  by  filtering  with  a  Gaussian,  since  the  filtered  function  is  a  solution  of  the 
heat  equation  (Widder,  1975).  Thus,  in  order  to  safely  perform  numerical  differentiation,  it 
is  nece  jsary  to  use  band-limited  or  Gaussian  filters. 

Note  that  if  approximation  (in  the  Weierstrass-Bernstein  sense)  rather  than  interpolation  is 
used,  v'e  obtain  uniform  convergence  for  all  continuous  bounded  functions  on  bounded 
sets.  Therefore,  differentiation  through  approximation  is  successful  on  bounded  sets  for  all 
bounded  C'  functions.  We  will  see,  however,  that  convergence  is  too  slow  for  this  approach 
to  be  p  actical. 

In  what  follows,  we  will  use  a  one-dimensional  approach  for  the  sake  of  simplicity,  but  all 
our  conclusions  and  results  can  be  easily  extended  to  two  dimensions. 


4.1.  Interpolation  and  differentiation 


Consider  a  function  f(s)  defined  in  [«,6j  and  have  «<  *0  <  *t  <**<...<*»<...  < 
tu  <-  h  distinct  points,  and 


h  -  /(**)  [I] 

the  values  of  /  at  sk.  It  is  well  known  that  there  exists  a  polynomial  r„(j-)  of  degree  u  such 
that  for  the  given  values  x„  <•  r,...  v  r„  takes  the  values  «/o,j/i •••,?/.,  (Davis,  1975).  This 
polynomial  is 

n 

i’-.(-r)  \2\ 

II 

whore  I  (./  )  are  Lagrange  polynomials 
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.  /  *  _ (f:  -  *oX*-  x,)...(r-  x*-iX*  -  Tk+,)...(x  -  x„) 

k(  1  (X*  -  *„)(**  -  X|).  -xt+1).  ..(**-*„)•  1,,] 

From  equation  [2]  and  [3],  we  consider  the  best  way  to  estimate  the  derivative  of  /(x)  in 
**./'(**)  by  computing 


Pn(xk)  =  ^2ykl'k(x)  *=„.  HJ 

o 

In  order  to  use  this  procedure  reliably  we  need  to  know  that  in  some  way  p„(x)  is  a  good 
approximation  of  /(x)  outside  the  sampled  points.  We  would  like  to  know  that  by  increasing 
the  number  n  of  sampled  points  xn  in  [a,  6],  we  have 

lim„_oo  Pn(*)  =  /(*)•  [5] 

This  is  equivalent  to  uniform  convergence.  At  the  beginning  of  the  century,  Bernstein  (see 
P.  Davis,  1975)  proved  that  equidistant  interpolation  over  )x|  <  l  to  the  function  y  —  |x| 
diverges  for  0  <  |x|  <  I;  that  is,  continuity  of  f(x)  is  not  sufficient  to  ensure  uniform 
convergence.  Runge  showed  that  even  if  /(x)  is  analytical  in  [o,fc]  uniform  convergence 
may  fail  (P.  Davis,  1976).  For  the  function  f(x)  =  1/(1  +  x2),  Runge  showed  that  if  pn(x ) 
interpolates  /(x)  at  equidistant  points  in  [-5,5],  p„(x)  converges  to  /  only  in  |x|  <  3.63.. .  and 
diverges  outside  the  interval.  Although  f(x)  is  analytic  in  S  is  not  analytic  in  the  complex 
plane  C  and  the  singular  points  ±i  induce  this  divergence  (see  P.  Davis,  1975).  To  obtain 
uniform  convergence  of  pn(x)  to  f{x)  in  [a,  6),  it  is  necessary  for  /(z)  to  be  analytic  in  a 
subregion  of  the  complex  plane  C  containing  the  segment  [a,  6]  (see  Theorem  4.3.1  of  P. 
Davis,  1975). 

It  may  be  useful  to  remember  that  polynomial  interpolation  is  not  optimal  if  the  sampled 
points  are  equidistant.  Polynomial  interpolation  is  optimal  if  interpolation  of  /(x)  in  [0, 1] 
of  order  n  is  carried  at  the  zeros  of  the  Chebychev  polynomials  7’„(x).  In  general,  the 
procedure  of  differentiation  through  interpolation  with  polynomial  or  related  functions  may 
badly  fail  when  applied  to  arbitrary  sampled  functions. 


4.2.  Differentiation  of  analytic-,  entire-  and  band-limited  functions 


If  we  know  that  f(x)  is  analytic  in  [a,  b]  and  we  have  no  information  about  its  behavior  on 
the  complex  plane  C  we  cannot  safely  use  differentiation  through  interpolation.  If  /(x)  is 
analytic  in  xk  €  [a,  6],  we  have 


/(x)  =  /(x*)  +  (x  -  x*)/">(xk)  +  • . .  +  ^L~-r/,n)(**)  +  •  ••  l«] 

in  \xk  -  h,  xk  i  *].  If  we  have  2n  +  I  sampled  points  in  |x*  -  fi,xk  -i  A),  from  equation  [6],  we 
can  obtain  2w  linear  equations,  from  which  we  can  solve  for  /u>(sk),j  --  I. .  :Jn.  In  the  case 
of  three  or  five  equidistant  points  we  obtain  the  formulae  shown  in  Appendix  1. 

The  main  problem  with  this  procedure  of  numerical  differentiation  is  that  we  do  not  generally 
know  the  radius  of  convergence  of  the  Taylor  expansion  (6]  and,  given  an  arbitrary  sampled 
analytical  function,  we  do  not  know  how  many  points  around  x*  fall  in  the  convergence 
interval.  When  the  class  of  original  functions  /  is  further  restricted  to  entire  functions,  the 
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Taylor  expansion  [6]  is  valid  in  R,  that  is,  it  has  an  infinite  radius  of  convergence,  and  the 
above  procedure  can  be  carried  out  safely. 

Now  let  us  suppose  that  /(z)  is  entire,  /«/,*(»),  and 

Iim»~oo3up<w^"~  =  6  [7] 

P 


where 


M(p)  =  max|,|=p  |/(«)| 


[») 


with  z  e  C  and  /  extended  to  the  complex  plane. 

If  6  <  oo  then  f(x)  is  said  to  be  of  exponential  type  6  and  by  the  Paley-Wiener  Theorem 
(Boas.  1954;  Achieser,  1956),  f(x)  is  band  limited  with  F(u>)  =  0  for  |u|  >  S,  where  F(w) 
is  the  Fourier  transform  of  /(z).  The  Paley-Wiener  Theorem  represents  the  connection 
between  entire  and  band  limited  functions.  It  may  be  useful  to  remember  that  the  Gaussian 
is  entire  and  belongs  to  //*(R)  but  falls  off  just  too  quickly  to  be  of  finite  exponential  type 
and  therefore  to  be  band-limited. 

If  f(x)  is  band-limited  (with  cutoff  frequency  u>„),  and  /(z)  has  been  correctly  sampled  at 
equidistant  points  spaced  h,  the  Shannon  Sampling  Theorem  gives  us 


-f-OO 


/(*)  =  53  /«sinc 


-oo4 


5(-*) 


where  sincz  =  In  this  case,  we  can  compute  f'k  exactly  from  [9]  as 


l») 


V  -  A  —  * 

-OO.^fc 

-  (/*  + 1  -  fk-l)  -  ^(fk+2  -  fk-2 )  +  g(/*+ 1  -  fk-i) 


[10  <1 


Although  [10]  is  exact  for  correctly  sampled  band -limited  functions,  it  converges  rather 
slowly. 


4.3.  Approximation  and  Differentiation 


It  is  well  known  that  if  f(r)  is  continuous  in  [«,/>],  for  every  given  «  >  0,  we  can  find  a 
polynomial  j>„(z)  of  sufficiently  high  degree  such  that 

I/M  “  PnMI  <  «  «  <  *  <  b.  |IH] 

This  is  the  so-called  Weierstrass  approximation  theorem.  Now  let  us  suppose  that  we  have 
an  /(.r)  continuous  and  bounded  in  |0, 1]  and  we  know  the  values  of  }{s )  at  equidistant 
points  xk  *.  Instead  of  interpolating  a  polynomial  through  the  known  points,  we  can 
construct  the  Bernstein  polynomial: 
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»-{*)=  I”] 

k=0 

Observe  that  n„(0)  =  /(o)  and  H„(l)  =  /(l),  but  apart  from  0  and  1  «„(*)  is  not  in  general 
equal  to  /(z*).  A  fundamental  theorem  of  Berstein  shows  that 

lim n~coDn(x)  =  :(x)  in  [0,1].  [13] 

Moreover,  if  f(x)  e  C\  we  have 

limB^QOB'n(z)  =  /'(x)  in  [0,1].  [14] 

Thus,  Bernstein  polynomials  provide  simultaneous  approximation  of  the  function  and  its 
derivatives.  If  we  want  to  obtain  an  estimation  of  the  derivative  of  a  sampled  function,  we 
can  construct  the  Bernstein  polynomial  from  the  sampled  values,  compute  the  analytical 
derivative  of  /i„(z)  and  take  it  as  an  estimate  of  /'(x).  The  drawback  of  Bernstein  polynomials 
is  that  their  convergence  is  very  slow  as  shown  by  the  poor  performance  of  a  3-  or  5-  point 
approximation.  Therefore,  if  we  have  samples  of  a  generic  function  of  class  C1,  we  can 
use  Bernstein  polynomials  to  obtain  an  estimation  of  the  values  of  derivatives  at  sampled 
points,  but  many  points  are  needed  for  a  good  estimate.  If  we  have  samples  of  an  entire 
function,  the  most  efficient  procedure  is  to  use  the  equations  derived  in  Appendix  1 . 
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